Using Access Logs

This page describes how to convert your existing access logs into a format that Spherical Defense can ingest and learn from.

circle-exclamation

Common Log Format (Apache)

The Common Log Format is the standard Apache logging format for web server access logging. This format contains the following fields:

  1. The IP address of the client (remote host) which made the request to the server.

  2. The RFC 1413arrow-up-right identityarrow-up-right of the client. Usually "-".

  3. The UserId of the person requesting the document. Usually "-" unless .htaccess has requested authentication.

  4. The date, time, and time zone that the request was received, by default in strftimearrow-up-right format %d/%b/%Y:%H:%M:%S %z.

  5. The request line from the client. The method GET, /apache_pb.gif the resource requested, and HTTP/1.0 the HTTP protocolarrow-up-right.

  6. The HTTP status codearrow-up-right returned to the client. 2xx is a successful response, 3xx a redirection, 4xx a client error, and 5xx a server error.

  7. The size of the object returned to the client, measured in bytesarrow-up-right.

127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

To convert this file into a format that can be ingested by a running Spherical Defense instance, complete the following steps:

  1. SSH into your Spherical Defense instance.

  2. Create a new file in the ~/ directory called apache_log_converter.py

  3. Copy the following script into the file:

import argparse
import json
import re
parser = argparse.ArgumentParser(description="Convert Apache logs to JSON")
parser.add_argument("-s", "--source", type=str, required=True)
parser.add_argument("-t", "--target", type=str, required=True)
regex = re.compile(
    '([^ ]*) ([^ ]*) ([^ ]*) \[([^]]*)\] "([^"]*)" ([^ ]*) ([^ ]*)'
    )
def parse_line(line):
    m = regex.match(line)
    if not m:
        return "{}"
    host, ignore, user, date, request, status, size = m.groups()
    method_items = list(map(lambda x: x.replace("\"", ""), request.split(" ")))
    if len(method_items) == 2:
        method_items.append("-")
    _dict = {"host":host, "user":user, "timestamp":date, "method":method_items[0], "path":method_items[1], "protocol": method_items[2], "code":status, "size":size}
    return json.dumps(_dict)
def translate_file(source, target):
    with open(source, 'r') as _in, open(target, 'w+') as _out:
        for line in _in:
            _out.write(parse_line(line))
            _out.write("\n")
if __name__ == '__main__':
    args = parser.parse_args()
    translate_file(args.source, args.target)
    print("Converted {} to {}".format(args.source, args.target))

4. Ensure that your Apache logs are in the same directory as the above script. 5. Run the following command: python3 apache_log_converter.py -s <SOURCE FILENAME> -t <TARGET FILENAME>

6. After you have generated a file that can be ingested by the Spherical Defense data ingestor, go back to the AMI quickstart below to continue the on-boarding process.

Amazon Machine Imagechevron-right

Combined Log Format (NGINX)

The combined log format is the same as the above, but with two extra fields included:

  • A referrer URL

  • A user-agent

IIS Log Format

W3C Extended Log Format

Last updated