bbxfuzz: Source-Aware Stall Breaking for BusyBox Fuzzing

6 분 소요

Summary

I built bbxfuzz because mutation-based fuzzing can get stuck in front of complex branch conditions. AFL++ is excellent at executing inputs and measuring coverage, but simple input mutation often cannot infer the exact string, numeric, parser-state, or code-flow condition needed to enter a deeper path. The result is a long coverage stall at a specific edge or branch.

BusyBox applets were a good target for testing this idea. They include small languages and structured formats: awk scripts, shell scripts, ed commands, vi commands, dpkg archives, HTTP requests, and more. That variety creates many places where raw mutation can run for a long time without producing the one input shape or branch predicate that unlocks the next path.

The design insight was to split the work by what each component is good at:

AFL++ stays in charge of execution, coverage, crashes, and truth.
A local grammar mutator keeps feeding syntactically useful inputs without network calls.
A local solver gets the first chance at simple stalled branches.
Codex is used only when a coverage stall needs source-aware reasoning, with the blocked branch condition, code flow, solver diagnostics, and the closest corpus input.
Any Codex output goes back into the fuzzer as seed files and grammar changes, not as a trusted conclusion.

That made the LLM a source-aware stall breaker instead of a blind generator.

bbxfuzz design: AFL++ runs local grammar mutation first, then coverage stalls are escalated to Codex through a local solver and source-backed review tasks. — bbxfuzz keeps the fast path local and escalates only when the fuzzer has concrete coverage evidence that it is stuck.

The Problem I Faced

The core problem was not just invalid syntax. The core problem was that mutation-only fuzzing often cannot explain why a near-miss input failed a branch condition.

BusyBox is a multi-call binary, and each applet has a different input contract. Some applets read scripts from files, some read commands from stdin, some parse binary archive formats, and some are better tested through small harnesses. Treating them all as raw bytes made the fuzzer waste work on invalid inputs, but even syntactically valid inputs could still stop before a hard branch.

That is where I wanted to use an LLM. Not as a replacement fuzzer, and not as a random test-case generator, but as a source-aware assistant for the exact moment where AFL++ had coverage evidence that it was stuck. I needed a workflow where the model only saw a small, evidence-backed task:

Here is the blocked branch.
Here is the enclosing C source.
Here is the closest input that already reaches the area.
Explain why the current input fails the condition.
Generate a few minimal bypass seeds and update the grammar if the source reveals a missing token, constant, or production.

That framing matters because the fuzzer, not the model, decides whether the answer was useful.

How I Solved It

bbxfuzz has three practical phases.

Phase 0: Initialize the Applet

Before fuzzing an applet, the tool builds applet-specific artifacts: a grammar JSON file, initial seeds, and variant-hunting prompts. The applet registry records the BusyBox source files, input type, input description, and AFL++ feeding mode for each target. This keeps awk, hush, vi, dpkg, udhcpc, and the other applets from being treated as one generic byte stream.

Phase 1: Mutate Locally

The local Tier 1 mutator reads the grammar and produces new seeds without calling an LLM. It combines grammar derivation, corpus-aware mutation, and strategy templates. The current default configuration uses a 60-second mutation cycle, 50 seeds per batch, and a 1 MB generated-seed cap.

This is the speed layer. It is cheap, repeatable, and does not need a model in the loop.

Phase 2: Solve or Escalate Stalls With Source Context

When edge growth stays below the configured threshold across the stall window, bbxfuzz takes a coverage snapshot, finds stuck branches, and extracts nearby C context. A local regex-based constraint solver tries first. If it can patch a seed for a simple branch condition, the fuzzer can keep moving without Codex.

When the solver produces no seeds, or after repeated solver-only rounds, the stalled branch is escalated into a reviewable task in my private workflow. The work item is intentionally narrow: use the supplied evidence to generate bypass seeds and update the grammar.

The important detail is that the prompt is not just “make more inputs.” It carries the branch condition, function body, call chain, covered siblings, closest corpus input, and solver diagnostics when available. That is what lets the model reason about the actual path constraint and code flow instead of guessing.

The Design Insight

The useful boundary was this:

Use models for source-aware decisions; use fuzzers for execution truth.

I did not want the model to replace AFL++. I wanted it to read the part of the C source that AFL++ had already identified as interesting, then produce small artifacts that AFL++ could immediately accept or reject.

This also changed the grammar from a static file into a campaign memory. When a stall reveals a missing keyword, magic value, command form, or boundary token, the grammar can be updated so future local mutations start closer to that path.

Results

The project currently registers 24 BusyBox applet targets, with four blacklisted because they reached coverage plateaus. The live applet set spans editors, shells, coreutils-like tools, archive parsers, and network-facing inputs.

For the vulnerability-reporting side of the work, I keep the public details intentionally high level here. The same loop has produced BusyBox vulnerability reports that I track locally through mailing-list and MITRE CNA-LR disclosure paths. Some reports have assigned CVEs, and another bundle is still recorded locally as pending. While anything remains pending, I am not publishing exact identifiers, submission dates, PoC names, commit hashes, or applet-to-report mapping in this post.

For validation, the local reports use a RED/GREEN gate: reproduce on the unpatched target, apply the generated patch emails, and re-run the ASAN checks against the email-applied tree. That kept the disclosure side tied to evidence instead of model output.

The result I care about is not just “LLM found bugs.” The result is a loop:

AFL++ discovers coverage reality.
bbxfuzz extracts the stuck source context.
The local solver tries cheap seed patches first.
Codex proposes small seed and grammar changes when local solving is not enough.
AFL++ tests those changes.
Crashes are reproduced, patched, and rechecked before disclosure.

That loop gave me a way to move from coverage stalls to source-backed reports without trusting unverified model output.

What Worked Well

The strongest part of the design was forcing every high-level idea to become a file the fuzzer could use. A seed either increases coverage, crashes, or does nothing. A grammar update either generates useful shapes or it does not. A patch either turns RED reproducer runs into GREEN runs or it does not.

The reviewable task workflow also helped. It turned “ask Codex” into a small queue of source-backed changes: each stall had an evidence bundle, a seed update, and a grammar diff.

What I Would Improve

I would keep improving the feedback loop between coverage and grammar updates. The more precisely the tool can say “this token or production is missing,” the less manual reasoning is needed at each stall. I would also keep the public disclosure path separate from the fuzzing path, because pending vulnerability details should not leak just because the fuzzer generated a good internal report.

Accuracy Boundary

This post only states details I checked against the local repository and report artifacts on 2026-07-01. It intentionally does not include pending PoCs, exploit details, unassigned CVE identifiers, or report-to-applet mapping. If the MITRE status changes later, this post should be updated instead of treating the 2026-07-01 status as permanent.

Twitter Facebook LinkedIn

Sanghyun Park