I finally got around to writing this post as well as
updating the official Diaphora help.odt file.
This demo quickly demonstrates how easy it is to detect
new changes/deletions to patched binaries with Diaphora. I walk you thru how to
look for removed/replaced functions (_strcmp, _gets) in a vulnerable
binary, and show the reader how to discover the two new secure
replaced functions (_strncmp, _fgets) in a patched binary. Let's not waste any time and jmp right in!
Diaphora Support?
It's
important to note that Diaphora for now will only be supported for
the two latest releases of IDA Pro. This means that current support
is only valid for IDA Pro 6.7, and 6.8.
Running Diaphora
In
order to run Diaphora, simply, unpack the compressed distribution
file (or perform a gitclone - which I think is easier) wherever you
prefer and directly execute “diaphora.py” from the IDA Pro menu
File → Script file. Please be advised that Joxean Koret (Diaphora
author) will be releasing an update to the two newest releases of IDA
for a new hot key feature. The new hot key functionality will make it
so you can assign a custom shortcut to open Diaphora vs manually
opening it. I like to place the directory inside the IDA
"%install%/scripts" folder. Once the script diaphora.py is
executed, a dialog like the following one will be opened:
This
dialog, although can be a bit confusing at first, is used for both
exporting the current IDA database to SQLite format as well for
performing diffing against another SQLite exported format database.
The first field, is the path of the SQLite file format database that will be created with all the information extracted from the current database.
The 2nd field is the other SQLite format database to diff the current database against. If this field is left empty, Diaphora will just export the current database to SQLite format. If the 2nd field is not empty, it will diff both databases.
The first field, is the path of the SQLite file format database that will be created with all the information extracted from the current database.
The 2nd field is the other SQLite format database to diff the current database against. If this field is left empty, Diaphora will just export the current database to SQLite format. If the 2nd field is not empty, it will diff both databases.
The
other fields, the check-boxes, are explained bellow:
- Use the decompiler if available. If the Hex-Rays decompiler is installed with IDA and IDA Python bindings are available, Diaphora will use the decompiler to get many interesting information that will help during the bindiffing process.
- Export only non-IDA generated functions. Enable if you neither want sub_* functions nor library functions to be exported.
- Do not export instructions and basic blocks. Export only function summaries, not all instructions. Showing differences in a graph between functions will not be available.
- Use probably unreliable methods. Diaphora uses many heuristics to try to match functions in both databases being compared. However, some heuristics are not really reliable or the ratio of similarity is very low. Check this box if you want to see also the likely unreliable matches Diaphora my find. Unreliable results are shown in a specific list, it doesn't mix the “Best results” (results with a ratio of 1.00) with the “Partial results” (results with a ratio of 0.50 or higher) or “Unreliable results”.
- Use slow heuristics. Some heuristics can be quite expensive and take long. For medium to big databases, it's disabled by default and is recommended to left unchecked unless the results from a execution with this option disabled are not good enough. It will likely find more better matches than the normal, not that slow, heuristics, but it will take significantly longer.
- Relaxed calculations of difference ratios. Diaphora uses, by default, a kind of aggressive method to calculate difference ratios between matches. It's possible to relax that aggressiveness level by checking this option. Under the hood, the function SequenceMatcher.quick_ratio is used when this option is unchecked and SequenceMatcher.real_quick_ratio when this option is checked. Also, when the option is checked, Diaphora will use too the difference ratio of the primes numbers calculated from the AST of the pseudo-code of the 2 functions, calculating the highest ratio from the AST, assembly and pseudo-code comparisons.
- Use experimental heuristics. It says it all: experimental heuristics are enabled only if this check-box is marked. Disabled by default as they are likely not useful.
- Ignore automatically generated names. Enable this option to ignore sub_* names for the 'Same name' heuristic.
- Ignore all function names. Enable this option to ignore all function names for the 'Same name' heuristic.
- Ignore small functions. Enable this option to ignore
thunk functions, nullstubs, etc.
Diaphora quick start
Finding differences in new versions (Patch diffing)
In order to use Diaphora we need at least two binary files to compare. I will use two different versions of a small binary with buffer overflow vulnerabilities as an example.
- cbd98888a848fa5a4927ef2c2cf3c94c fixed_psswd-win86.exe (primary db)
- 2fc23ed48120710d67f2ee94e5d18de7 vuln_psswd-win86.exe (secondary db)
The
file “vuln_psswd-win86.exe” is the pre-patch copy and the binary
“fixed_psswd-win86.exe” is the fixed version. I start by
launching IDA Pro 32-bit (idaq) and open the file
“fixed_psswd-win86.exe”. Once the initial auto-analysis finishes
launch Diaphora by running the script “diaphora.py” from
the IDA Pro menu File → Script file. The following
dialog will open:
We
only need to care about 2 things:
- Field “Export current database to SQLite”. This is the path to the SQLite database that will be created with all the information extracted from the IDA database of this binary.
- Field “Use the decompiler if available”. If the Hex-Rays decompiler is available and we want to use it, we will leave this check-box marked, otherwise uncheck it.
After correctly selecting the appropriate values, press OK. It will start exporting all the data from the IDA database. When export process finishes the message “Database exported.” will appear in the IDA's Output Window. Now, save and close this database (fixed_psswd-win86.idb), and open the “vuln_psswd-win86.exe” binary. Wait until IDA's auto-analysis finishes and, after it completes, run Diaphora like with the previous binary file. This time, we will select in the 2nd field, the one named “SQLite database to diff”, the path to the .sqlite file we just exported in the previous step, as shown in the next figure:
After
this, press the OK button. It will first export the current IDA
database (vuln_psswd-win86.idb) to the SQLite format as understood by
Diaphora and, then, right after finishing, compare both databases.
IDA will show a wait box dialog while Diaphora conducts it's extensive checks including current heuristics, unmatched functions and others are being applied to match functions in both databases as shown in the next figure:
Also
please note that you are able to re-open a closed tab by doing the
below actions.
After
a while a set of lists (choosers, in the HexRays workers language)
will appear:
There
is one more list that is not shown for this database, named
“Unreliable matches”. This list holds all the matches that aren't
considered reliable. However, in the case of this binary with
symbols, there isn't even a single unreliable result. There are,
however, unmatched functions in both the primary (the latest version)
and the secondary database (the previous version):
The
above image shows the functions not matched in the secondary
database, that is: the functions removed in the latest patched
version.
The
second figure shows the functions not matched in the previous
database, the new functions added:
It
seems they removed/replaced two vulnerable functions (_strcmp,
_gets), and replaced them with two secure alternative functions
called _strncmp, and _fgets.
Let's
take a look now to the “Best matches” tab opened:
There
are many functions in the “Best matches” tab, 24 functions to be
exact, and in the primary database there are only a few. The results
shown in the “Best matches” tab above are those functions matched
where the heuristic was equal (like “100% equal”, where all
attributes are equal, or “Equal pseudo-code”, where the
pseudo-code generated by the decompiler is equal) that, apparently,
doesn't have any difference at all. If you're diffing these binaries
to find vulnerabilities fixed, just skip this tab, you will be
more interested in the “Partial matches” one ;)
In
the Partial matches tab we have only two results:
It
shows the functions matched between both databases and, in the
description field, it says which heuristic matched. The results also
display the ratio of differences. If you're looking for functions
where a vulnerability was likely fixed, this is where you want to
look at. It seems that the function “_main”, for example, was
lightly modified: the ratio is 0.810, so it means that a small % of
the function differs between both databases.
Diaphora
has three different view modes:
- Assembly graph (Diff assembly in a graph)
- Plain assembly (Diff assembly)
- Pseudo-code (Diff pseudo-code)
Let's
see the differences:
- Assembly graph (Diff assembly in a graph)
Right
click on the result and select “Diff assembly in a graph”, the
following graph will appear:
Since
this is a very small binary with only very little changes it does not
have any yellow nodes (minor changes). Please note that the skin I
used in IDA replaces a white background with a black one so white
nodes (no changes) show up as black above.
Quick
example:
Lets
have a look at a bigger binary that has multiple new changes below.
The
nodes in yellow color, are these with only minor changes; pink ones,
are these that are either new or heavily modified and the blank ones,
the basic blocks that were not modified at all.
- Plain assembly (Diff assembly)
Now
let's diff the assembly in plain text: go back to the “Partial
matches” tab, right click on the function “_main” and select
“Diff assembly”:
It
shows the differences, in plain assembly, that one would see by using
a tool like the Unix command “diff”.
We
can also diff the pseudo-code: go back to the “Partial matches”
tab, right click in the function and select “Diff pseudo-code”:
As
we can see, it shows all the differences in the pseudo-code in a side
by side comparison diff, like with the assembly diff. After you know
how the 3 different ways to see differences work, you can choose your
favorite or use all of the 3 for specific cases.
Next post will show how to write a poc exploit for the above vuln binary. Check back soon.
No comments:
Post a Comment