Processing of results

As exaplained in the Configuration chapter, once an analysis is completed, Cuckoo invokes a script which can be used to access and manipulate the results produced. It’s conceived to be customized by the user to make it do whatever he prefers.

Such script (called “processor”) is invoked concurrently to Cuckoo, making it completely independent from the sandbox execution, and takes the path to the analysis results as argument.

The default processor looks like following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import sys
from optparse import OptionParser

from cuckoo.processing.data import CuckooDict
from cuckoo.reporting.reporter import ReportProcessor
from cuckoo.logging.crash import crash

def main():
    analysis_path = None

    parser = OptionParser(usage="usage: %prog [options] analysispath")
    parser.add_option("-m", "--message",
                      action="store",
                      type="string",
                      dest="message",
                      default=None,
                      help="Specify a message to notify to the processor script")
    parser.add_option("-c", "--custom",
                      action="store",
                      type="string",
                      dest="custom",
                      default=None,
                      help="Specify a custom value to be used by the processor")

    (options, args) = parser.parse_args()

    if len(args) == 1:
        try:
            analysis_path = args[0]
        except IndexError, why:
            pass

    if options.message:
        print options.message

    if options.custom:
        print options.custom

    if analysis_path:
        # Generate reports out of abstracted analysis results.
        ReportProcessor(analysis_path).report(CuckooDict(analysis_path).process())

    return True

if __name__ == "__main__":
    try:
        main()
    except KeyboardInterrupt:
        print "User aborted."
    except SystemExit:
        pass
    except:
        crash()

What it does is obtain a dictionary out of the raw results and invoke the generation of the enabled reports as explained in Configuration.

In order to simplify some of the processing tasks you might need to perform, Cuckoo provide some ready-to-use functions and classes which are generally located in “cuckoo/processing/”.

Retrieving details on a file

The first thing you might be interested in, is retrieving some details on the binary you just analyzed. For this purpose there’s a dedicated class called File provided by the module cuckoo.processing.file. It takes the path to a file as argument and invoking process() retrieves a dictionary containing some static details. You can actually use this clan on any file you want, perhaps also on dropped files.

Following is an example usage and output:

>>> import pprint
>>> from cuckoo.processing.file import File
>>> details = File("analysis/1/malware.exe").process()
>>> pprint.pprint(details)
{'crc32': '76652E7B',
 'md5': '9b2de8b062a5538d2a126ba93835d1e9',
 'name': 'malware.exe',
 'sha1': 'f3b2025f64aaec787b1009223927b78b1677b92a',
 'sha256': '676a818365c573e236245e8182db87ba1bc021c5d8ee7443b9f673f26e7fd7d1',
 'sha512': '807142b3141bddbf5b2c2be78ff755433fca67b3f78ea7c5f7e74001614097a2bf439d90fa6ab415e736c59829be40d8c220f60117478e1a1ee372a97faa8fcb',
 'size': 194560,
 'ssdeep': '3072:J9GgqeRehDMVYQKSGJhJX11o0wojolTmXJmfEaQHNo8+PZ7ya4aMi4ry0zxLbnJG:J9JqeohDMODSGFX11o0wo0AJ4+a82Z7U',
 'type': 'PE32 executable for MS Windows (GUI) Intel 80386 32-bit'}

Processing behavioral analysis results

As you read in Analysis Results, Cuckoo generates some csv-like raw logs for every process it monitored. These logs contains all the win32 API calls that Cuckoo was able to intercept while tracking the processes. In order to make the information contained there more accessible, you can use the BehaviorAnalysis class provided by the module cuckoo.processing.analysis.

This class takes the path to the logs files as argument and, by calling its function process(), it will return a dictionary containing the behavioral results in a structured format.

Following is an example usage and output:

>>> import pprint
>>> from cuckoo.processing.analysis import BehaviorAnalysis
>>> results = BehaviorAnalysis("analysis/1/logs/").process()
>>> pprint.pprint(results)
[{'calls': [{'api': 'LoadLibraryA',
             'arguments': [{'name': 'lpFileName', 'value': 'KERNEL32.DLL'}],
             'repeated': 0,
             'return': '0x7c800000',
             'status': 'SUCCESS',
             'timestamp': '20111219100536.679'},

            [...]

            {'api': 'VirtualAllocEx',
             'arguments': [{'name': 'th32ProcessID', 'value': '764'},
                           {'name': 'szExeFile', 'value': 'binary.exe'},
                           {'name': 'lpAddress', 'value': '0x00000000'},
                           {'name': 'dwSize', 'value': '4826'},
                           {'name': 'flAllocationType',
                            'value': '0x00003000'},
                           {'name': 'flProtect', 'value': '0x00000040'}],
             'repeated': 0,
             'return': '0x00150000',
             'status': 'SUCCESS',
             'timestamp': '20111219100536.679'},
            {'api': 'CreateFileW',
             'arguments': [{'name': 'lpFileName',
                            'value': 'C:\\WINDOWS\\system32\\svchost.exe'},
                           {'name': 'dwDesiredAccess',
                            'value': 'GENERIC_READ'}],
             'repeated': 1,
             'return': '0x000000b4',
             'status': 'SUCCESS',
             'timestamp': '20111219100546.734'},
            {'api': 'CreateProcessA',
             'arguments': [{'name': 'lpApplicationName',
                            'value': '(null)'},
                           {'name': 'lpCommandLine',
                            'value': 'svchost.exe'}],
             'repeated': 0,
             'return': '1548',
             'status': 'SUCCESS',
             'timestamp': '20111219100546.734'},
            {'api': 'VirtualAllocEx',
             'arguments': [{'name': 'th32ProcessID', 'value': '1548'},
                           {'name': 'szExeFile', 'value': 'svchost.exe'},
                           {'name': 'lpAddress', 'value': '0x00000000'},
                           {'name': 'dwSize', 'value': '0'},
                           {'name': 'flAllocationType',
                            'value': '0x00003000'},
                           {'name': 'flProtect', 'value': '0x00000040'}],
             'repeated': 0,
             'return': '',
             'status': 'FAILURE',
             'timestamp': '20111219100546.734'},
            {'api': 'ExitProcess',
             'arguments': [{'name': 'uExitCode', 'value': '0x00000000'}],
             'repeated': 0,
             'return': '',
             'status': '',
             'timestamp': '20111219100546.744'}],
  'first_seen': '20111219100536.679',
  'process_id': '764',
  'process_name': 'binary.exe'}]

Using the normalized data generated by BehaviorAnalysis class, you can even generate a tree with the ProcessTree class which orders the monitored processes recursively.

Following is an example usage and output:

>>> import pprint
>>> from cuckoo.processing.analysis import BehaviorAnalysis, ProcessTree
>>> results = BehaviorAnalysis("analysis/2/logs/").process()
>>> tree = ProcessTree(results).process()
>>> pprint.pprint(tree)
[{'children': [{'children': [], 'name': 'kadef.exe', 'pid': 788},
               {'children': [], 'name': 'cmd.exe', 'pid': 1764}],
  'name': 'malware.exe',
  'pid': 1488}]

In the same way, using the normalized results, you can generate a summary with the BehaviorSummary class which contains key values currently including accessed files, accessed registry keys and accessed mutexes.

Following is an example usage and output:

>>> import pprint
>>> from cuckoo.processing.analysis import BehaviorAnalysis, BehaviorSummary
>>> results = BehaviorAnalysis("analysis/1/logs/").process()
>>> summary = BehaviorSummary(results).process()
>>> pprint.pprint(summary)
{'files': ['\\\\.\\PIPE\\lsarpc',
           'C:\\WINDOWS\\',
           '\\\\.\\MountPointManager',
           'C:\\debug.txt',
           'C:\\1d8f4af4e7bcec0c3a3cec09c23a64b1.exe',
           'C:\\Documents and Settings\\User\\Application Data\\Pywou\\ikxi.exe',
           'C:\\Documents and Settings\\User\\Application Data\\Leydav\\idyr.fud',
           'C:\\Documents and Settings\\User\\Application Data',
           'C:\\Documents and Settings\\User\\Application Data\\Pywou',
           'C:\\Documents and Settings\\User\\Application Data\\Leydav',
           'C:\\DOCUME~1\\Me\\LOCALS~1\\Temp\\tmpc8bd469a.bat',
           'C:\\WINDOWS\\system32\\cmd.exe'],
 'keys': ['HKEY_LOCAL_MACHINE\\\\Software\\Policies\\Microsoft\\Windows\\Safer\\CodeIdentifiers',
          '0x000000dc\\\\Software\\Policies\\Microsoft\\Windows\\Safer\\CodeIdentifiers',
          'HKEY_LOCAL_MACHINE\\\\Software\\Microsoft\\Command Processor',
          'HKEY_CURRENT_USER\\\\Software\\Microsoft\\Command Processor'],
 'mutexes': ['{8EEEA37C-5CEF-11DD-9810-2A4256D89593}',
             'Global\\{502CC696-1792-D22B-3DCD-DB4062D1CF32}',
             'Local\\{C19386EC-57E8-4394-3DCD-DB4062D1CF32}',
             'ShimCacheMutex',
             'Local\\{EA35E3DD-32D9-6832-3DCD-DB4062D1CF32}',
             'Global\\{0C34F9D1-28D5-8E33-4ED7-02E811CB169A}',
             'Global\\{0C34F9D1-28D5-8E33-42D5-02E81DC9169A}',
             'Global\\{0C34F9D1-28D5-8E33-02D5-02E85DC9169A}']}

Processing network traffic

In the exact same way as you can process behavioral results, you can also process network traffic from the PCAP file using the Pcap class available from cuckoo.processing.pcap.

At current stage it retrieves a dictionary with all the information on DNS and HTTP requests as well as all UDP and TCP packets.

Following is an example usage and output:

>>> import pprint
>>> from cuckoo.processing.pcap import Pcap
>>> network = Pcap("analysis/3/dump.pcap").process()
>>> pprint.pprint(network)
{'dns': [{'hostname': 'www.google.com', 'ip': '74.125.127.104'}],
 'http': [{'body': '',
           'data': 'GET / HTTP/1.1\r\nHost: www.google.com\r\nUser-Agent: Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-us,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nConnection: keep-alive\r\n\r\n',
           'host': 'www.google.com',
           'method': 'GET',
           'path': '/',
           'port': 80,
           'uri': 'http://www.google.com/',
           'user-agent': 'Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2',
           'version': '1.1'}],
 'tcp': [{'dport': 80,
          'dst': '74.125.127.104',
          'sport': 1214,
          'src': '10.0.2.15'}],
 'udp': [{'dport': 67,
          'dst': '255.255.255.255',
          'sport': 68,
          'src': '0.0.0.0'}]}

Putting all together

If you don’t want to bother invoking all the necessary classes but just want a comprehensive (and huge) dictionary containing everything you need, you can simply use the CuckooDict class provided by the module cuckoo.processing.data, just like the default package do.

Following is an example usage and output:

>>> import pprint
>>> from cuckoo.processing.data import CuckooDict
>>> analysis = CuckooDict("analysis/2/").process()
>>> pprint.pprint(analysis)
{'behavior': {'processes': [<results provided by class BehaviorAnalysis>],
              'processtree': [<results provided by class ProcessTree>],
              'summary': [<results provided by class BehaviorSummary]},
 'debug': {'log': '<content of analysis.log file>'},
 'dropped': [<results provided by class File on all dropped files>],
 'file': {<results provided by class File on the analyzed file>},
 'info': {'duration': '165 seconds',
          'ended': '2012-01-31 23:03:15',
          'started': '2012-01-31 23:01:09',
          'version': 'v0.3.2'},
 'network': {<results provided by class Pcap>},
 'static': {<results provided by class static analysis classes}.
 'screenshots': {<screenshots taken during analysis>}}

The output has been stripped out of results.