Processing Modules

Cuckoo’s processing modules are Python scripts that let you define custom ways for analyzing the raw results generated by the sandbox and append some information to a global container that will be later used by the reporting modules.

You can create as many modules as you want, as long as they follow a predefined structure that we will present in this chapter.

Global Container

After an analysis is completed, Cuckoo will invoke all the processing modules available in the modules/processing/ directory.

Every module will then be initialized and executed and the data returned will be appended in a data structure that we’ll call global container.

This container is simply just a big Python dictionary that contains all the abstracted results produced by all the modules sorted by their defined keys.

Cuckoo is already provided with a default set of modules which will generate a standard global container. It’s important for the existing reporting modules (HTML report etc.) that these default modules are not modified, otherwise the resulting global container structure would change and the reporting modules wouldn’t be able to recognize it and extract the information used to build the final report.

Following is a JSON-like representation of a default global container:

{
    "info": {
        "started": <timestamp>,
        "ended": <timestamp>,
        "duration": <duration in seconds>,
        "version": <version of Cuckoo>
    },
    "signatures": [
        {
            "severity": <severity level>,
            "description": <signature description>
            "alert": <boolean value>,
            "references": [<any reference link>],
            "data": [<any contextual data>],
            "name": <signature name>
        }
    ],
    "behavior": {
        "processes": [
            {
                "parent_id": <parent PID>,
                "process_name": <process name>,
                "process_id": <PID>,
                "first_seen": <timestamp when the process was first seen>,
                "calls": [
                    {
                        "category": <API function category>,
                        "status": <SUCCESS or FAILURE>,
                        "return": <any returned value>,
                        "timestamp": <timestamp of the call>,
                        "repeated": <how many times it was repeated consecutively>,
                        "api": <API function>,
                        "arguments": [
                            {
                                "name": <argument name>,
                                "value": <argument value>
                            }
                        ]
                    },
                    <...>
                ],
                <...>
            }
        ],
        "processtree": [
            {
                "pid": <PID>,
                "name": <process name>,
                "children": [<recursive child entries>]
            }
        ],
        "summary": {
            "files": [<list of files accessed>],
            "keys": [<list of registry keys accessed>],
            "mutexes": [<list of mutexes accessed>]
        }
    },
    "static": {<static analysis if available for the file type>},
    "dropped": [
        {
            "size": <file size>,
            "sha1": <SHA1 hash>,
            "name": <file name>,
            "type": <file type>,
            "crc32": <CRC32 hash>,
            "ssdeep": <Ssdeep hash>,
            "sha256": <SHA256 hash>,
            "sha512": <SHA512 hash>,
            "md5": <MD5 hash>
        },
        <...>
    ],
    "file": {
        "size": <file size>,
        "sha1": <SHA1 hash>,
        "name": <file name>,
        "type": <file type>,
        "crc32": <CRC32 hash>,
        "ssdeep": <Ssdeep hash>,
        "sha256": <SHA256 hash>,
        "sha512": <SHA512 hash>,
        "md5": <MD5 hash>
    },
    "debug": {
        "log": <content of analysis.log>
    },
    "network": {
        "http": [
            {
                "body": <request body>,
                "uri": <request URI>,
                "method": <request method>,
                "host": <host name>,
                "version": <HTTP version>,
                "path": <path of the request>,
                "data": <dump of whole request>,
                "port": <port>
            },
            <...>
        ],
        "udp": [
            {
                "dport": <destination port>,
                "src": <source IP>,
                "dst": <destination IP>,
                "sport": <source port>
            },
            <...>
        ],
        "hosts": [<list of involved IP addresses>],
        "dns": [
            {
                "ip": <IP address>,
                "hostname": <domain name>
            },
        ],
        "tcp": [
            {
                "dport": <destination port>,
                "src": <source IP>,
                "dst": <destination IP>,
                "sport": <source port>
            },
            <...>
        ]
    }
}

Every processing module added will end up with a dedicated dictionary entry in this data structure.

Getting started

All processing modules are and should be placed in modules/processing/. In this directory you will find a set of default modules that are used to produce the traditional Cuckoo analysis reports.

A basic processing module could look like:

1
2
3
4
5
6
7
8
from lib.cuckoo.common.abstracts import Processing

class MyModule(Processing):

    def run(self):
        self.key = "file"
        data = do_something()
        return data

Every processing module should contain:

  • A class inheriting Processing.
  • A run() function.
  • A self.key attribute defining the name to be used as a subcontainer for the returned data.
  • A set of data (list, dictionary or string etc.) that will be appended to the global container.

You can also specify an order value, which allows you to run the available processing modules in an ordered sequence. By default all modules are set with an order value of 1 and are executed in alphabetical order.

If you want to change this value your module would look like:

1
2
3
4
5
6
7
8
9
from lib.cuckoo.common.abstracts import Processing

class MyModule(Processing):
    order = 2

    def run(self):
        self.key = "file"
        data = do_something()
        return data

The processing modules are provided with some attributes that can be used to access the raw results for the given analysis:

  • self.analysis_path: path to the folder containing the results (e.g. storage/analysis/1)
  • self.log_path: path to the analysis.log file.
  • self.conf_path: path to the analysis.conf file.
  • self.file_path: path to the analyzed file.
  • self.dropped_path: path to the folder containing the dropped files.
  • self.logs_path: path to the folder containing the raw behavioral logs.
  • self.shots_path: path to the folder containing the screenshots.
  • self.pcap_path: path to the network pcap dump.

Example

A good example to understand better the mechanics behind this is the Yara module. Yara is a tool and library used to match user’s defined signatures containing static binary patterns against the analyzed file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import os
import logging

try:
    import yara
    HAVE_YARA = True
except ImportError:
    HAVE_YARA = False

from lib.cuckoo.common.constants import CUCKOO_ROOT
from lib.cuckoo.common.abstracts import Processing

log = logging.getLogger(__name__)

class YaraSignatures(Processing):
    """Yara signature processing."""

    def run(self):
        """Run Yara processing.
        @return: hash with matches.
        """
        self.key = "yara"
        matches = []

        if HAVE_YARA:
            try:
                rules = yara.compile(filepath=os.path.join(CUCKOO_ROOT, "data", "yara", "index.yar"))
                for match in rules.match(self.file_path):
                    matches.append({"name" : match.rule, "meta" : match.meta})
            except yara.Error as e:
                log.warning("Unable to match Yara signatures: %s" % e[1])
        else:
            log.warning("Yara is not installed, skip")

        return matches

As you can see in line #22 we defined the key name for the module. Next in the run() function we compile the signatures file and match every signature against the file located at self.file_path. The matched signatures are appended in the matches dictionary which is then returned and that will be included in the global container under the section “yara”.