Cyan Examiner finds illegal content fast, using statistical techniques to achieve extraordinary speeds. Results are to a ‘level of confidence’, and robust testing ensures these are consistently achieved.
The use of a statistical process is unusual in digital forensics. In this post we explore what our confidence level actually means and how we test the software. This is a detailed post about how Cyan Examiner works. If this is your first time here, consider reading more about what we do.
The answer is a red or green result:
- Red is easy to explain: “Yes. There’s a match.” The match can then be viewed, allowing an operator to confirm the result.
- The green answer, “No, there is no match, to a 99% confidence level”, needs a little more explanation.
Most digital forensics tools are deterministic – they give the same result every time. Cyan Examiner uses a statistical process, and so negative results (greens) have an associated confidence level.
Using Statistics in Digital Forensics
Cyan Examiner is fast. Part of that speed is achieved by sampling the disk, rather than reading everything on it.
The confidence level setting in Cyan Examiner determines how the sampling process is tuned, trading confidence against scanning time. (For example, when operating at a 99% confidence level Cyan Examiner will typically scan a 1TB mechanical drive in less than 30 minutes.)
A confidence level of 99% means that if there is content matching the Contraband Filter on the device Cyan Examiner will find a match at least 99% of the time.
Along with the confidence level, the other important setting is target size, which defaults to 20MiB.
If there is exactly 20MiB of contraband on a disk and a setting of 99% confidence is used, at least 99% of scans will find a match, but up to 1% of scans could find no matches. If there is less than 20MiB of contraband, the proportion of results with no matches could be higher.
In the real world many devices will contain more than 20MiB of contraband material, and so the proportion of scans finding a match (and giving a “red” result) will be much, much higher than 99%.
How do you test a tool that does not give absolute results?
Traditional forensics software produces the same result every time it is used, so it is easy to compare the actual result against the expected result.
Testing something that is designed to give one answer at least 99% of the time, but which can occasionally legitimately give a different answer, requires a different approach.
Instead of testing once, the tests must be run many times and the spread of results checked against statistical models.
Data, Hypothesis, Test, Repeat…
When testing internally we start by creating a test dataset – a test disk containing an exact known quantity of contraband.
We can then calculate precisely the mix of red and green results that could be expected when running the tool on that calibrated test dataset, based on the tool’s settings of target size and confidence level.
This gives us hypotheses to test like: “when configured to a 99% confidence level, the tool is working as designed if it finds a match at least ninety-nine times in one hundred scans”.
However, the tool may find something in 100 of 100 scans, particularly as it has been built using conservative design principles and tends to ‘err on the safe side’. To be sure that our testing encounters scans that don’t find a match, and to get a complete picture of statistical performance, we must run the scan many, many times.
Here is an example based on recent testing, running 10,000 scans against 90%, 95% and 99% confidence levels:
Running the tool on the test dataset 10,000 times at each of these confidence levels will yield results like these:
Displaying these graphically, the gold bar is the minimum standard that must be achieved, and the cyan bar is the actual test result.
This shows that in each of the test cases Cyan Examiner’s performance exceeded the standard required of it.
The graph also shows a by-product of our focus on speed. At higher confidence levels the number of samples required increases dramatically. By design we ensure that the confidence level target is met without taking excessive time, so the amount by which the target is exceeded gets smaller as the target increases. Testing at higher confidence levels verifies this (although above 99% confidence the testing becomes very time consuming!).
Is One Match Enough?
“Is there any known illegal content on this disk?” is a great yes/no question for triage and prioritisation, but it is throwing away information that is useful for testing.
Finding a ‘yes’, is actually finding ‘one or more matches’, but most of the times we get a ‘yes’ the number of matches found will be greater than one.
The distribution of number of matches we expect to find can be calculated using the binomial theorem for any given test dataset and settings.
This gives us a second test to ensure that the software is working as designed. For the calibrated test dataset and software settings, we calculate both the distribution we would expect if we were exactly meeting the target confidence (the minimum acceptable), and the distribution that represents the theoretical maximum result based on the internal working of the software. The actual result should lie between these two.
In the following graph the gold line is the minimum acceptable result and the green line the maximum likely result. The blue line is the distribution from a run of 10,000 scans.
Looking at this graph we can see the blue curve matches the shape of the two predictions, and sits between minimum and maximum curves when considered either horizontally or vertically. This means our test results are in the expected range, and gives us great confidence that the statistical sampling is performing exactly as expected.
Cyan Examiner returns results fast, but that doesn’t mean we take shortcuts on testing. It is important to us that Cyan Examiner reliably and consistently achieves the stated confidence level.
Many forensics processes now require to be accredited to IS0 17025:2017. Fortunately, ISO 17025 was originally intended for laboratory usage where measurements that show a statistical variation are commonplace. This means that ISO 17025 is inherently designed to allow for sampling processes backed by relevant statistical testing for validation.
We have produced a ‘Capabilities and Limitations’ document which contains detailed information on Cyan Examiner’s expected performance, to aid with validation and verification.
We are constantly improving our testing, and we routinely run batches of 10,000 or more scans to check the achieved confidence level and results distribution. It is a time consuming process, and we have internal tools to automate testing where possible.
If you’re interested in conducting this type of testing on Cyan Examiner yourself, please get in touch. We would also love to hear from you if you are testing our software in your organisation, as we’re always open to learning from the work of others, and finding out about how better we can support it.