深入理解 LibAFL：Mutator 篇（一）

前言

与 Executor 类似，如果我们想为 LibAFL 实现变异器，只需要实现 Mutator trait 和 Named trait 即可。

在 LibAFL 的源码中其实已经给出了 NopMutator 这一示例：

src/mutators/mod.rs

/// [`Mutator`] that does nothing, used for testing.
///
/// Example:
///
/// ```rust,ignore
/// let mut stages = tuple_list!(StdMutationalStage::new(NopMutator(MutationResult::Mutated)));
/// ```
#[derive(Debug, Copy, Clone)]
pub struct NopMutator {
    result: MutationResult,
}
 
impl NopMutator {
    /// The passed argument is returned every time the mutator is called.
    #[must_use]
    pub fn new(result: MutationResult) -> Self {
        Self { result }
    }
}
 
impl<I, S> Mutator<I, S> for NopMutator {
    fn mutate(&mut self, _state: &mut S, _input: &mut I) -> Result<MutationResult, Error> {
        Ok(self.result)
    }
    #[inline]
    fn post_exec(&mut self, _state: &mut S, _new_corpus_id: Option<CorpusId>) -> Result<(), Error> {
        Ok(())
    }
}
 
impl Named for NopMutator {
    fn name(&self) -> &Cow<'static, str> {
        &Cow::Borrowed("NopMutator")
    }
}

在 Mutator trait 中，需要实现 mutate 和 post_exec 这两个函数。对于 mutate 函数，它接收的参数是状态 State 和被选择变异的输入 input。在变异期间，我们需要直接修改 input 的数据，返回它是否变异的结果 MutationResult：

src/mutators/mod.rs

pub enum MutationResult {
    /// The [`Mutator`] executed on this `Input`. It may not guarantee that the input has actually been changed.
    Mutated,
    /// The [`Mutator`] did not mutate this `Input`. It was `Skipped`.
    Skipped,
}

有两种可能性，要么变异（MutationResult::Mutated），要么跳过变异（MutationResult::Skipped），也有可能在变异的时候出错，这个时候返回 Error。

对于 post_exec，大部分情况下都不需要实现。

除 Mutator trait 之外，还需要实现 Named trait，它返回的是变异器的名称。

Mutator 本身能讲的乏善可陈，因为它只负责按照固定的算法进行变异，根据策略选择种子的功能并不在它的身上实现。我们的重点应该放在分析他人实现的 Mutator 上。

HavocScheduledMutator

说到 Mutator，不得不提的就是 LibAFL 中最常用的随机变异器 HavocScheduledMutator（在之前的版本叫 StdScheduledMutator），它是仿照 AFL++ 实现的变异器的集合。除了随机变异外，它还支持接收字典进行变异，我们首先来看一下它的基础实现和 Mutator 相关的 trait 实现：

src/mutators/scheduled.rs

/// A [`Mutator`] that stacks embedded mutations in a havoc manner on each call.
#[derive(Debug)]
pub struct HavocScheduledMutator<MT> {
    name: Cow<'static, str>,
    mutations: MT,
    max_stack_pow: usize,
}
 
impl<MT> Named for HavocScheduledMutator<MT> {
    fn name(&self) -> &Cow<'static, str> {
        &self.name
    }
}
 
impl<I, MT, S> Mutator<I, S> for HavocScheduledMutator<MT>
where
    MT: MutatorsTuple<I, S>,
    S: HasRand,
{
    #[inline]
    fn mutate(&mut self, state: &mut S, input: &mut I) -> Result<MutationResult, Error> {
        self.scheduled_mutate(state, input)
    }
    #[inline]
    fn post_exec(&mut self, _state: &mut S, _new_corpus_id: Option<CorpusId>) -> Result<(), Error> {
        Ok(())
    }
}

HavocScheduledMutator 中的结构包含：

name：自身的名称
mutations：一个变异器元组（MutatorsTuple），可能包含不同的变异器
max_stack_pow：变异的次数（ $2^{n}$ 次）

HavocScheduledMutator 最常见的初始化形式是：

    // Setup a mutational stage with a basic bytes mutator
    let mutator = HavocScheduledMutator::new(havoc_mutations());

我们先来看 new 这个初始化函数：

src/mutators/scheduled.rs

impl<MT> HavocScheduledMutator<MT>
where
    MT: NamedTuple,
{
    /// Create a new [`HavocScheduledMutator`] instance specifying mutations
    pub fn new(mutations: MT) -> Self {
        HavocScheduledMutator {
            name: Cow::from(format!(
                "HavocScheduledMutator[{}]",
                mutations.names().join(", ")
            )),
            mutations,
            max_stack_pow: 7,
        }
    }

默认变异的最大次数是 $2^{7} = 128$ ，上文的 havoc_mutations() 返回的显然就是 MutatorsTuple，除此之外，还要求这个 MutatorsTuple 实现了 NamedTuple trait。继续看一下 havoc_mutations() 函数：

src/mutators/havoc_mutations.rs

/// Get the mutations that compose the Havoc mutator
#[must_use]
pub fn havoc_mutations() -> HavocMutationsType {
    havoc_mutations_no_crossover().merge(havoc_crossover())
}

这里分成了两种变异策略：

havoc_mutations_no_crossover()：变异单个输入；
havoc_crossover()：从 state 中选择种子和当前输入一起进行交叉变异。

我们看一下 havoc_mutations_no_crossover：

src/mutators/havoc_mutations.rs

/// Get the mutations that compose the Havoc mutator (only applied to single inputs)
#[must_use]
pub fn havoc_mutations_no_crossover() -> HavocMutationsNoCrossoverType {
    tuple_list!(
        BitFlipMutator::new(),
        ByteFlipMutator::new(),
        ... ...
        BytesInsertCopyMutator::new(),
        BytesSwapMutator::new(),
    )
}

显然这里的变异器就是 Bits、Bytes 等变异器，而对于 havoc_crossover：

src/mutators/havoc_mutations.rs

/// Get the mutations that compose the Havoc mutator's crossover strategy
#[must_use]
pub fn havoc_crossover() -> HavocCrossoverType {
    tuple_list!(
        CrossoverInsertMutator::new(),
        CrossoverReplaceMutator::new(),
    )
}

CrossoverInsertMutator 和 CrossoverReplaceMutator 分别进行了插入（随机种子）和替换的变异策略。

tokens_mutations

tokens_mutation 和 havoc_mutations 类似。我们先看一下它是如何使用的：

LibAFL/fuzzers/inprocess/libfuzzer_libmozjpeg/src/lib.rs

    // Setup a basic mutator with a mutational stage
    let mutator = 
HavocScheduledMutator::with_max_stack_pow(
    havoc_mutations().merge(tokens_mutations()), 6);

tokens_mutations 代码为：

src/mutators/scheduled.rs

/// Get the mutations that uses the Tokens metadata
#[must_use]
pub fn tokens_mutations() -> tuple_list_type!(TokenInsert, TokenReplace) {
    tuple_list!(TokenInsert::new(), TokenReplace::new())
}

那么，tokens 是在哪里传入，又是在哪里使用的呢？我们先看传入的代码：

LibAFL/fuzzers/inprocess/libfuzzer_libmozjpeg/src/lib.rs

    // Add the JPEG tokens if not existing
    if state.metadata_map().get::<Tokens>().is_none() {
        state.add_metadata(Tokens::from_file("./jpeg.dict")?);
    }

Tokens 在 fuzzer 初始化时保存在 state 中。在使用时呢？我们看看 TokenInsert 变异器的代码：

src/mutators/token_mutations.rs

impl<I, S> Mutator<I, S> for TokenInsert
where
    S: HasMetadata + HasRand + HasMaxSize,
    I: ResizableMutator<u8> + HasMutatorBytes,
{
    fn mutate(&mut self, state: &mut S, input: &mut I) -> Result<MutationResult, Error> {
        let max_size = state.max_size();
        let tokens_len = ...
        let token_idx = state.rand_mut().below(tokens_len);
 
        let size = input.mutator_bytes().len();
        // # Safety
        // after saturating add it's always above 0
 
        let off = state
            .rand_mut()
            .below(unsafe { NonZero::new(size.saturating_add(1)).unwrap_unchecked() });
 
        let meta = state.metadata_map().get::<Tokens>().unwrap();
        let token = &meta.tokens()[token_idx];
        ...

Tokens 是从 state 中取出的。

我们已经知道了 token 的使用方法，也发现了它可以从文件中传入。那么我们该传入什么样的格式呢？先去翻一下 jpeg.dict，发现它的格式是：

header_jfif="JFIF\x00"
header_jfxx="JFXX\x00"

section_ffc0="\xff\xc0"
section_ffc2="\xff\xc2"
...

在 tokens 中，value 是很重要的，那么 key 也很重要吗？我们不知道目标的结构信息，也可以指定 key 吗？抱着这样的疑惑，我去翻了 from_file 函数，它会调用 add_from_file。在该函数中有一段注释：

we are only interested in ’”…”’, not prefixed ‘foo = ’

说明它只对 value 感兴趣。在对该函数进行分析后，我们可以得到 tokens 文件的格式要求：

每行保存一个 token
可以是 key="value" 的格式，value 一定要用引号保存；
可以是 "value" 的格式
每行不允许出现 \" 这样的格式，且最后一个字符一定为 "

ScheduledMutator

通过前面的代码，我们已经初步了解了 HavocScheduledMutator 中 havoc_mutations 和 tokens_mutations 的用法。那 HavocScheduledMutator 中是怎样选择不同的变异器进行变异的呢？在上文中，我们看到为 HavocScheduledMutator 实现的 Mutator trait 中的 mutate 函数会调用 scheduled_mutate 函数：

src/mutators/scheduled.rs

impl<I, MT, S> Mutator<I, S> for HavocScheduledMutator<MT>
where
    MT: MutatorsTuple<I, S>,
    S: HasRand,
{
    #[inline]
    fn mutate(&mut self, state: &mut S, input: &mut I) -> Result<MutationResult, Error> {
        self.scheduled_mutate(state, input)
    }

scheduled_mutate 是在哪里实现的呢？向前跟踪，我们发现它是属于 ScheduledMutator 这个 trait 的：

src/mutators/scheduled.rs

/// A [`Mutator`] scheduling multiple [`Mutator`]s for an input.
pub trait ScheduledMutator<I, S>: ComposedByMutations + Mutator<I, S>
where
    Self::Mutations: MutatorsTuple<I, S>,
{
    /// Compute the number of iterations used to apply stacked mutations
    fn iterations(&self, state: &mut S, input: &I) -> u64;
 
    /// Get the next mutation to apply
    fn schedule(&self, state: &mut S, input: &I) -> MutationId;
 
    /// New default implementation for mutate.
    /// Implementations must forward `mutate()` to this method
    fn scheduled_mutate(&mut self, state: &mut S, input: &mut I) -> Result<MutationResult, Error> {
        let mut r = MutationResult::Skipped;
        let num = self.iterations(state, input);
        for _ in 0..num {
            let idx = self.schedule(state, input);
            let outcome = self.mutations_mut().get_and_mutate(idx, state, input)?;
            if outcome == MutationResult::Mutated {
                r = MutationResult::Mutated;
            }
        }
        Ok(r)
    }
}

scheduled_mutate 函数是这个 trait 的核心，它的工作是使用变异器进行若干次变异（变异次数由 iterations 指定），在每次变异时选取不同的变异器（由 schedule 指定）。对于 HavocScheduledMutator，它使用了默认的 scheduled_mutate 函数，自己实现了剩下的方法：

src/mutators/scheduled.rs

impl<I, MT, S> ScheduledMutator<I, S> for HavocScheduledMutator<MT>
where
    MT: MutatorsTuple<I, S>,
    S: HasRand,
{
    /// Compute the number of iterations used to apply stacked mutations
    fn iterations(&self, state: &mut S, _: &I) -> u64 {
        1 << (1 + state.rand_mut().below_or_zero(self.max_stack_pow))
    }
 
    /// Get the next mutation to apply
    fn schedule(&self, state: &mut S, _: &I) -> MutationId {
        debug_assert_ne!(self.mutations.len(), 0);
        // # Safety
        // We check for empty mutations
        state
            .rand_mut()
            .below(unsafe { NonZero::new(self.mutations.len()).unwrap_unchecked() })
            .into()
    }
}

变异次数就是传入的 max_stack_pow 进行 2 的幂次再加一，而选择子变异器的概率则是根据 state 中的随机数选择器进行选择。

简要总结

我们从 NopMutator 开始，深入理解了 HavocScheduledMutator 的使用和实现原理，这会让我们更好地理解 LibAFL 的 Mutator 结构，使用和实现我们希望的变异策略。

🚂QRZ的星穹列车

探索

深入理解 LibAFL：Mutator 篇（一）

目录

前言

HavocScheduledMutator

tokens_mutations

ScheduledMutator

简要总结

目录

关系图谱

反向链接