Basics of Fuzzing
20 Apr 2017Influenced by this awesome live stream by Gynvael Coldwind, where he talks about what fuzzing is about, and also builds a basic fuzzer from scratch!
What is a fuzzer, in the first place? And why do we use it?
Consider that we have a library/program that takes input data. The input may be structured in some way (say a PDF, or PNG, or XML, etc; but it doesn't need to be any "standard" format). From a security perspective, it is interesting if there is a security boundary between the input and the process / library / program, and we can pass some "special input" which causes unintended behaviour beyond that boundary. A fuzzer is one such way to do this. It does this by "mutating" things in the input (thereby possibly corrupting it), in order to lead to either a normal execution (including safely handled errors) or a crash. This can happen due to edge case logic not being handled well.
Crashing is the easiest way for error conditions. There might be others as well. For example, using ASAN (address sanitizer) etc might lead to detecting more things as well, which might be security issues. For example, a single byte overflow of a buffer might not cause a crash on its own, but by using ASAN, we might be able to catch even this with a fuzzer.
Another possible use for a fuzzer is that inputs generated by fuzzing one program can also possibly be used in another library/program and see if there are differences. For example, some high-precision math library errors were noticed like this. This doesn't usually lead to security issues though, so we won't concentrate on this much.
How does a fuzzer work?
A fuzzer is basically a mutate-execute-repeat loop that explores the state space of the application to try to "randomly" find states of a crash / security vuln. It does not find an exploit, just a vuln. The main part of the fuzzer is the mutator itself. More on this later.
Outputs from a fuzzer?
In the fuzzer, a debugger is (sometimes) attached to the application to get some kind of a report from the crash, to be able to analyze it later as security vuln vs a benign (but possibly important) crash.
How to determine what areas of programs are best to fuzz first?
When fuzzing, we want to usually concentrate on a single piece or small set of piece of the program. This is usually done mainly to reduce the amount of execution to be done. Usually, we concentrate on the parsing and processing only. Again, the security boundary matters a lot in deciding which parts matter to us.
Types of fuzzers?
Input samples given to the fuzzer are called the corpus. In oldschool fuzzers (aka "blind"/"dumb" fuzzzers) there was a necessity for a large corpus. Newer ones (aka "genetic" fuzzers, for example AFL) do not necessarily need such a large corpus, since they explore the state on their own.
How are fuzzers useful?
Fuzzers are mainly useful for "low hanging fruit". It won't find complicated logic bugs, but it can find easy to find bugs (which are actually sometimes easy to miss out during manual analysis). While I might say input throughout this note, and usually refer to an input file, it need not be just that. Fuzzers can handle inputs that might be stdin or input file or network socket or many others. Without too much loss of generality though, we can think of it as just a file for now.
How to write a (basic) fuzzer?
Again, it just needs to be a mutate-run-repeat loop. We need to be
able to call the target often (subprocess.Popen
). We also need to be
able to pass input into the program (eg: files) and detect crashes
(SIGSEGV
etc cause exceptions which can be caught). Now, we just
have to write a mutator for the input file, and keep calling the
target on the mutated files.
Mutators? What?!?
There can be multiple possible mutators. Easy (i.e. simple to
implement) ones might be to mutate bits, mutate bytes, or mutate to
"magic" values. To increase chance of crash, instead of changing only
1 bit or something, we can change multiple (maybe some parameterized
percentage of them?). We can also (instead of random mutations),
change bytes/words/dwords/etc to some "magic" values. The magic values
might be 0
, 0xff
, 0xffff
, 0xffffffff
, 0x80000000
(32-bit
INT_MIN
), 0x7fffffff
(32-bit INT_MAX
) etc. Basically, pick ones
that are common to causing security issues (because they might trigger
some edge cases). We can write smarter mutators if we know more info
about the program (for example, for string based integers, we might
write something that changes an integer string to "65536"
or -1
etc). Chunk based mutators might move pieces around (basically,
reorganizing input). Additive/appending mutators also work (for
example causing larger input into buffer). Truncators also might work
(for example, sometimes EOF might not be handled well). Basically, try
a whole bunch of creative ways of mangling things. The more experience
with respect to the program (and exploitation in general), the more
useful mutators might be possible.
But what is this "genetic" fuzzing?
That is probably a discussion for a later time. However, a couple of links to some modern (open source) fuzzers are AFL and honggfuzz.