Processing of results

As exaplained in the Configuration chapter, once an analysis is completed, Cuckoo invokes a script which can be used to access and manipulate the results produced. It’s conceived to be customized by the user to make it do whatever he prefers.

Such script (called “processor”) is invoked concurrently to Cuckoo, making it completely independent from the sandbox execution, and takes the path to the analysis results as argument.

The default processor looks like following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import sys

from cuckoo.processing.data import CuckooDict
from cuckoo.reporting.reporter import ReportProcessor

def main(analysis_path):
    # Generate reports out of abstracted analysis results.
    ReportProcessor().report(CuckooDict(analysis_path).process())

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print "Not enough args."
        sys.exit(-1)

    main(sys.argv[1])

What it does is obtain a dictionary out of the raw results and invoke the generation of the enabled reports as explained in Configuration.

In order to simplify some of the processing tasks you might need to perform, Cuckoo provide some ready-to-use functions and classes which are generally located in “cuckoo/processing/”.

Retrieving details on a file

The first thing you might be interested in, is retrieving some details on the binary you just analyzed. For this purpose there’s a dedicated class called File provided by the module cuckoo.processing.file. It takes the path to a file as argument and invoking process() retrieves a dictionary containing some static details. You can actually use this clan on any file you want, perhaps also on dropped files.

Following is an example usage and output:

>>> import pprint
>>> from cuckoo.processing.file import File
>>> details = File("analysis/1/malware.exe").process()
>>> pprint.pprint(details)
{'crc32': '76652E7B',
 'md5': '9b2de8b062a5538d2a126ba93835d1e9',
 'name': 'malware.exe',
 'sha1': 'f3b2025f64aaec787b1009223927b78b1677b92a',
 'sha256': '676a818365c573e236245e8182db87ba1bc021c5d8ee7443b9f673f26e7fd7d1',
 'sha512': '807142b3141bddbf5b2c2be78ff755433fca67b3f78ea7c5f7e74001614097a2bf439d90fa6ab415e736c59829be40d8c220f60117478e1a1ee372a97faa8fcb',
 'size': 194560,
 'ssdeep': '3072:J9GgqeRehDMVYQKSGJhJX11o0wojolTmXJmfEaQHNo8+PZ7ya4aMi4ry0zxLbnJG:J9JqeohDMODSGFX11o0wo0AJ4+a82Z7U',
 'type': 'PE32 executable for MS Windows (GUI) Intel 80386 32-bit'}

Processing behavioral analysis results

As you read in Analysis Results, Cuckoo generates some csv-like raw logs for every process it monitored. These logs contains all the win32 API calls that Cuckoo was able to intercept while tracking the processes. In order to make the information contained there more accessible, you can use the Analysis class provided by the module cuckoo.processing.analysis.

This class takes the path to the logs files as argument and, by calling its function process(), it will return a dictionary containing the behavioral results in a structured format.

Following is an example usage and output:

>>> import pprint
>>> from cuckoo.processing.analysis import Analysis
>>> results = Analysis("analysis/1/logs/").process()
>>> pprint.pprint(results)
[{'calls': [{'api': 'LoadLibraryA',
             'arguments': [{'name': 'lpFileName', 'value': 'KERNEL32.DLL'}],
             'repeated': 0,
             'return': '0x7c800000',
             'status': 'SUCCESS',
             'timestamp': '20111219100536.679'},

            [...]

            {'api': 'VirtualAllocEx',
             'arguments': [{'name': 'th32ProcessID', 'value': '764'},
                           {'name': 'szExeFile', 'value': 'binary.exe'},
                           {'name': 'lpAddress', 'value': '0x00000000'},
                           {'name': 'dwSize', 'value': '4826'},
                           {'name': 'flAllocationType',
                            'value': '0x00003000'},
                           {'name': 'flProtect', 'value': '0x00000040'}],
             'repeated': 0,
             'return': '0x00150000',
             'status': 'SUCCESS',
             'timestamp': '20111219100536.679'},
            {'api': 'CreateFileW',
             'arguments': [{'name': 'lpFileName',
                            'value': 'C:\\WINDOWS\\system32\\svchost.exe'},
                           {'name': 'dwDesiredAccess',
                            'value': 'GENERIC_READ'}],
             'repeated': 1,
             'return': '0x000000b4',
             'status': 'SUCCESS',
             'timestamp': '20111219100546.734'},
            {'api': 'CreateProcessA',
             'arguments': [{'name': 'lpApplicationName',
                            'value': '(null)'},
                           {'name': 'lpCommandLine',
                            'value': 'svchost.exe'}],
             'repeated': 0,
             'return': '1548',
             'status': 'SUCCESS',
             'timestamp': '20111219100546.734'},
            {'api': 'VirtualAllocEx',
             'arguments': [{'name': 'th32ProcessID', 'value': '1548'},
                           {'name': 'szExeFile', 'value': 'svchost.exe'},
                           {'name': 'lpAddress', 'value': '0x00000000'},
                           {'name': 'dwSize', 'value': '0'},
                           {'name': 'flAllocationType',
                            'value': '0x00003000'},
                           {'name': 'flProtect', 'value': '0x00000040'}],
             'repeated': 0,
             'return': '',
             'status': 'FAILURE',
             'timestamp': '20111219100546.734'},
            {'api': 'ExitProcess',
             'arguments': [{'name': 'uExitCode', 'value': '0x00000000'}],
             'repeated': 0,
             'return': '',
             'status': '',
             'timestamp': '20111219100546.744'}],
  'first_seen': '20111219100536.679',
  'process_id': '764',
  'process_name': 'binary.exe'}]

Using the normalized data generated by Analysis class, you can even generate a tree with the ProcessTree class which orders the monitored processes recursively.

Following is an example usage and output:

>>> import pprint
>>> from cuckoo.processing.analysis import Analysis, ProcessTree
>>> results = Analysis("analysis/2/logs/").process()
>>> tree = ProcessTree(results).process()
>>> pprint.pprint(tree)
[{'children': [{'children': [], 'name': 'kadef.exe', 'pid': 788},
               {'children': [], 'name': 'cmd.exe', 'pid': 1764}],
  'name': 'malware.exe',
  'pid': 1488}]

Processing network traffic

In the exact same way as you can process behavioral results, you can also process network traffic from the PCAP file using the Pcap class available from cuckoo.processing.pcap.

At current stage it retrieves a dictionary with all the information on DNS and HTTP requests as well as all UDP and TCP packets.

Following is an example usage and output:

>>> import pprint
>>> from cuckoo.processing.pcap import Pcap
>>> network = Pcap("analysis/3/dump.pcap").process()
>>> pprint.pprint(network)
{'dns': [{'hostname': 'www.google.com', 'ip': '74.125.127.104'}],
 'http': [{'body': '',
           'data': 'GET / HTTP/1.1\r\nHost: www.google.com\r\nUser-Agent: Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-us,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nConnection: keep-alive\r\n\r\n',
           'host': 'www.google.com',
           'method': 'GET',
           'path': '/',
           'port': 80,
           'uri': 'http://www.google.com/',
           'user-agent': 'Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2',
           'version': '1.1'}],
 'tcp': [{'dport': 80,
          'dst': '74.125.127.104',
          'sport': 1214,
          'src': '10.0.2.15'}],
 'udp': [{'dport': 67,
          'dst': '255.255.255.255',
          'sport': 68,
          'src': '0.0.0.0'}]}

Putting all together

If you don’t want to bother invoking all the necessary classes but just want a comprehensive (and huge) dictionary containing everything you need, you can simply use the CuckooDict class provided by the module cuckoo.processing.data, just like the default package do.

Following is an example usage and output:

>>> import pprint
>>> from cuckoo.processing.data import CuckooDict
>>> analysis = CuckooDict("analysis/2/").process()
>>> pprint.pprint(analysis)
{'behavior': {'processes': [<results provided by class Analysis>],
              'processtree': [<results provided by class ProcessTree>]},
 'debug': {'log': '<content of analysis.log file>'},
 'dropped': [<results provided by class File on all dropped files>],
 'file': {<results provided by class File on the analyzed file>},
 'info': {'duration': '38846 seconds',
          'started': '2011-12-19 11:05:06',
          'version': 'v0.3'},
 'network': {<results provided by class Pcap>},
 'static': {}}

The output has been stripped out of results.