参考 Fuzzing101 with LibAFL - Part I: Fuzzing Xpdf1 和 Fuzzing101 with LibAFL - Part I.V: Speed Improvements to Part I2 做一下笔记。libafl 的自由度相当高,我觉得学习路线会比较陡峭,这一次我就不求甚解一波。
复现
先下载 xpdf
cd fuzzing-101-solutions/exercise-1
wget https://dl.xpdfreader.com/old/xpdf-3.02.tar.gz
tar xvf xpdf-3.02.tar.gz
rm xpdf-3.02.tar.gz
mv xpdf-3.02 xpdf
build.rs 本质上是做了如下工作:
# these are example commands that will be executed automatically by build.rs
# and were taken almost verbatim from Fuzzing101's README
cd fuzzing-101-solutions/exercise-1/xpdf
make clean
rm -rf install
export LLVM_CONFIG=llvm-config-15
CC=afl-clang-fast CXX=afl-clang-fast++ ./configure --prefix=./install
make
make install
具体实现方法之后再看,先照抄。
复制完代码之后发现默认的 libafl 版本是 0.10.1,编译不起来就改成了 0.13.2,结果发现好多东西都变了,比如 libafl::bolts
变成了 libafl_bolts
,还有一个 Executor :
We deleted
TimeoutExecutor
andTimeoutForkserverExecutor
and make it mandatory forInProcessExecutor
andForkserverExecutor
to have the timeout. NowInProcessExecutor
andForkserverExecutor
have the default timeout of 5 seconds.
参考官方代码改了一堆问题之后可以编译运行了:
cd exercise-1
cargo build --release
../target/release/exercise-one-solution
如果我们要运行其他程序的话,修改 executor 的参数就行了,例如这里的参数为:
let mut executor = ForkserverExecutor::builder()
.program("./xpdf/install/bin/pdftotext")
.parse_afl_cmdline(["@@"])
.coverage_map_size(MAP_SIZE)
.build(tuple_list!(time_observer, edges_observer))
.unwrap();
流程
研究一下流程:
- Corpus
corpus_dirs: vec!
:种子目录;input_corpus: InMemoryCorpus
:保存在内存中的语料库;timeouts_corpus: OnDiskCorpus
:满足需求条件的语料库;
- Observer
time_observer: TimeObserver
:记录执行时间;edges_observer: HitcountsMapObserver, StdMapObserver
:记录执行边的覆盖率信息;
- Feedback
feedback: MaxMapFeedback, TimeFeedback
:选择感兴趣的输入的反馈机制;- 组合
edges_observer
、time_observer
;
- 组合
objective: CrashFeedback, TimeoutFeedback
:选择满足需求(超时或崩溃)输入的反馈机制;
- Monitor:跟踪所有模糊测试客户端
monitor: SimpleMonitor
:这里使用了SimpleMonitor
向 terminal 发送报告;
- Event Manager
mgr: SimpleEventManager
:核心三部件之一,这里使用monitor
构建最简单的SimpleEventManager
;
- State
state: StdState
:核心三部件之一,保存模糊测试时的一些必要信息;- 组合了
input_corpus
、timeouts_corpus
、feedback
、objective
;
- 组合了
- Scheduler
scheduler: IndexesLenTimeMinimizerScheduler
:调度策略,作者使用IndexesLenTimeMinimizerScheduler
选取最快最小的种子;
- Fuzzer:
fuzzer: StdFuzzer
:核心三部件之一,生成种子,并处理执行后的状态和反馈;- 组合了
scheduler
、feedback
、objective
;
- 组合了
- Executor
executor: ForkserverExecutor
:执行器;- 指定运行的程序和参数;
- 组合
time_observer
和edges_observer
;
- 加载语料库
- Mutator
mutator: StdScheduledMutator
:变异器
- Stage
stage: StdMutationalStage
:对单个输入的操作,这里是使用 mutator 对输入做;
- 运行 Fuzzer
- 组合了
stages
、executor
、state
和mgr
。
- 组合了
感觉还是不好理解,所以我绘制了一张从上到下,从左到右的构建图:
修改
运行一段时间之后,只有超时没有崩溃,这是因为作者只配置了 TimeoutFeedback
,而高性能机器上在 timeout 之前就 crash 了,所以我建议还是把 CrashFeedback
加上。我们先理解一下原来的 Feedback 及其用法:
// A Feedback, in most cases, processes the information reported by one or more observers to
// decide if the execution is interesting. This one is composed of two Feedbacks using a logical
// OR.
//
// Due to the fact that TimeFeedback can never classify a testcase as interesting on its own,
// we need to use it alongside some other Feedback that has the ability to perform said
// classification. These two feedbacks are combined to create a boolean formula, i.e. if the
// input triggered a new code path, OR, false.
let mut feedback = feedback_or!(
// New maximization map feedback (attempts to maximize the map contents) linked to the
// edges observer. This one will track indexes, but will not track novelties,
// i.e. new_tracking(... true, false).
MaxMapFeedback::new(&edges_observer),
// Time feedback, this one never returns true for is_interesting, However, it does keep
// track of testcase execution time by way of its TimeObserver
TimeFeedback::new(&time_observer)
);
我们可以看到,这里其实用到了两种 Feedback 的组合。根据注释可知,TimeFeedback
不能独自判断一个样例是否有趣,因此这里用了一个 feedback_or
宏,如果 MaxMapFeedback
判断是否触发了新路径则认为输入是有趣的。
我最开始眼花了,把 TimeFeedback
看成了 TimeoutFeedback
,然而并不是。TimeFeedback
永远不会返回 True,但是它可以跟踪输入的执行时间。
光有趣还不够,我们还要保存一些符合我们要求的输入,例如在这里作者保存的是能触发超时的种子:
// A feedback is used to choose if an input should be added to the corpus or not. In the case
// below, we're saying that in order for a testcase's input to be added to the corpus, it must:
// 1: be a timeout
// AND
// 2: have created new coverage of the binary under test
//
// The goal is to do similar deduplication to what AFL does
//
// The feedback_and_fast macro combines the two feedbacks with a fast AND operation, which
// means only enough feedback functions will be called to know whether or not the objective
// has been met, i.e. short-circuiting logic.
let mut objective =
feedback_and_fast!(TimeoutFeedback::new(), MaxMapFeedback::new(&edges_observer));
这里作者通过 feedback_and_fast
建立了两个约束,一是要超时,二是要能发现新的路径,这样是为了执行与 AFL 类似的重复数据删除。
在最后,feedback
和 objective
都被用在了 state
中:
//
// Component: State
//
// Creates a new State, taking ownership of all of the individual components during fuzzing.
//
// On the initial pass, setup_restarting_mgr_std returns (None, LlmpRestartingEventManager).
// On each successive execution (i.e. on a fuzzer restart), it returns the state from the prior
// run that was saved off in shared memory. The code below handles the initial None value
// by providing a default StdState. After the first restart, we'll simply unwrap the
// Some(StdState) returned from the call to setup_restarting_mgr_std
let mut state = StdState::new(
// random number generator with a time-based seed
StdRand::with_seed(current_nanos()),
input_corpus,
timeouts_corpus,
// States of the feedbacks that store the data related to the feedbacks that should be
// persisted in the State.
&mut feedback,
&mut objective,
)
.unwrap();
第一个参数是随机数生成器,第二个参数是语料库,第三个参数是保存符合目标的语料库的位置。在最后两个参数中,feedback 会记录有趣的种子,objective 会保存符合要求的种子。
好了,在理解作者的意图之后,接下来该怎么做就很明显了,除了超时之外,我们肯定还要考虑能导致崩溃的输入,显然 CrashFeedback
是符合我们要求的。那我们该怎么使用它呢?
继续参考官方的示例,它使用了 feedback_or_fast!
宏去同时选取触发崩溃和超时的种子:
// A feedback to choose if an input is a solution or not
let mut objective = feedback_or_fast!(CrashFeedback::new(), TimeoutFeedback::new());
我们也照葫芦画瓢,引入相关的库后修改就好了。在修改之后运行 fuzz,我们可以看到成功保存了可以触发 crash 的输入。
在几天之后回顾这篇文章时,我发现我已经忘记了编译命令,这里记录一下:
cargo build --release
../target/release/exercise-one-solution
问题
那么接下来的问题是:
- 根据 state 的参数,超时和崩溃似乎保存在了同一目录下,按照 libafl 的设计哲学,这个目录保存的是符合我们要求的输入,那么是否能够分别指定崩溃和超时保存的目录呢?
- 在通过
feedback_or_fast!(CrashFeedback::new(), TimeoutFeedback::new());
保存符合要求的输入后,libafl 是怎么去重的呢?上文使用feedback_and_fast!(TimeoutFeedback::new(), MaxMapFeedback::new(&edges_observer));
通过是否发现新路径进行去重,而这里没有发现新路径但也会超时的情况,是否也会被保存呢?如果都会被保存的话,我们是否可以在feedback_or_fast
之后添加一个feedback_and_fast
的MaxMapFeedback
帮助我们去重呢?
上面的这些问题,随着学习路程的继续慢慢解答吧。
加速
我们可以使用持久模式而不是 forkserver 加速模糊测试。在 Fuzzing101 with LibAFL - Part I.V: Speed Improvements to Part I 中,作者修改了 xpdf 的源码并编写了 harness.c,这里不多加描述。
由于我们要将 fuzzer 编译成库,因此接下来将上文的 main.rs 重命名为 lib.rs。接下来看一下和上面的流程相比有什么改动吧。
Observer
在编译 harness 的过程中,作者使用了 libafl_cc
而不是上文的 afl-clang-[fast|lto]
。对于传统的 afl-clang-[fast|lto]
,libafl 可以根据 __AFL_SHM_ID
环境变量获取覆盖率信息,而对于 libafl_cc
则需要使用 libafl_targets
暴露 EDGES_MAP
:
let edges_observer =
HitcountsMapObserver::new(unsafe { std_edges_map_observer("edges") }).track_indices();
Monitor
为了避免目标打印的输出和 fuzzer 的输出混淆,作者使用 MultiMonitor 替换 SimpleMonitor。MultiMonitor 可以展示和累计每个客户端的统计数据。
let monitor = MultiMonitor::new(|s| {
println!("{}", s);
});
Event Manager
对于 MultiMonitor,使用的时候需要启动两个 fuzzer,按照作者的意思第一个开启的 fuzzer 也是客户端,但我觉得,这都开启端口听其他客户端的消息了,怎么看都是服务端吧:
let (state, mut mgr) = match setup_restarting_mgr_std(monitor, 1337, EventConfig::AlwaysUnique)
{
Ok(res) => res,
Err(err) => match err {
Error::ShuttingDown => {
return Ok(());
}
_ => {
panic!("Failed to setup the restarting manager: {}", err);
}
},
};
Harness
这是 forkserver 不存在的一个部件,专为 InProcessExecutor
而构造的。其中的 libfuzzer_test_one_input
就是我们编写 harness 时的 LLVMFuzzerTestOneInput
:
let mut harness = |input: &BytesInput| {
let target = input.target_bytes();
let buffer = target.as_slice();
libfuzzer_test_one_input(buffer);
ExitKind::Ok
};
Executor
既然使用了持久模式,相应的 executor 也会发生变化,InProcessExecutor
相比 ForkserverExecutor
需要更多的组件:
let mut executor = InProcessExecutor::new(
&mut harness,
tuple_list!(edges_observer, time_observer),
&mut fuzzer,
&mut state,
&mut mgr,
)
.unwrap();
具体的流程是什么样的还是之后慢慢研究吧。
运行 Fuzzer
这一部分主要是加入了类似于 __AFL_LOOP
的机制,确定重启次数以及设置可能的手动重启:
fuzzer
.fuzz_loop_for(&mut stages, &mut executor, &mut state, &mut mgr, 10000)
.unwrap();
// Since were using this fuzz_loop_for in a restarting scenario to only run for n iterations
// before exiting, we need to ensure we call on_restart() and pass it the state. This way, the
// state will be available in the next, respawned, iteration.
mgr.on_restart(&mut state).unwrap();
作者使用了 cargo make 机制来完成整个流程,通过 Makefile.toml
编写各个部分:
[tasks.clean]
dependencies = ["cargo-clean", "afl-clean", "clean-xpdf"]
[tasks.afl-clean]
script = '''
rm -rf .cur_input* timeouts fuzzer fuzzer.o libexerciseonepointfive.a
'''
[tasks.clean-xpdf]
cwd = "xpdf"
script = """
make --silent clean
rm -rf built-with-* ../build/*
"""
[tasks.cargo-clean]
command = "cargo"
args = ["clean"]
[tasks.rebuild]
dependencies = ["afl-clean", "clean-xpdf", "build-compilers", "build-xpdf", "build-fuzzer"]
[tasks.build-compilers]
script = """
cargo build --release
cp -f ../target/release/libexerciseonepointfive.a .
"""
[tasks.build-xpdf]
cwd = "build"
script = """
cmake ../xpdf -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=$(pwd)/../../target/release/compiler -DCMAKE_CXX_COMPILER=$(pwd)/../../target/release/compiler_pp
make
"""
[tasks.build-fuzzer]
script = """
../target/release/compiler_pp -I xpdf/goo -I xpdf/fofi -I xpdf/splash -I xpdf/xpdf -I xpdf -o fuzzer harness.cc build/*/*.a -lm -ldl -lpthread -lstdc++ -lgcc -lutil -lrt
"""
之后运行 cargo run rebuild
执行所有部分,并生成 fuzzer
文件。
最后在两个窗口中分别运行编译好的 fuzzer,最先运行的 fuzzer 会作为服务端。
总结
作者通过 libafl 编写了一整套模糊测试流程,可以实现基于 forkserver 的模糊测试和基于 persistent mode 的模糊测试。这篇笔记简单总结了 libafl 的堆叠过程与使用流程,足以见出 libafl 的高自由度。