Using Access Logs

This page describes how to convert your existing access logs into a format that Spherical Defense can ingest and learn from.

Web server logging rarely includes HTTP POST payloads, which makes this approach unsuitable if you expect inbound HTTP payloads in your traffic.

Common Log Format (Apache)

The Common Log Format is the standard Apache logging format for web server access logging. This format contains the following fields:

  1. The IP address of the client (remote host) which made the request to the server.

  2. The RFC 1413 identity of the client. Usually "-".

  3. The UserId of the person requesting the document. Usually "-" unless .htaccess has requested authentication.

  4. The date, time, and time zone that the request was received, by default in strftime format %d/%b/%Y:%H:%M:%S %z.

  5. The request line from the client. The method GET, /apache_pb.gif the resource requested, and HTTP/1.0 the HTTP protocol.

  6. The HTTP status code returned to the client. 2xx is a successful response, 3xx a redirection, 4xx a client error, and 5xx a server error.

  7. The size of the object returned to the client, measured in bytes.

127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

To convert this file into a format that can be ingested by a running Spherical Defense instance, complete the following steps:

  1. SSH into your Spherical Defense instance.

  2. Create a new file in the ~/ directory called apache_log_converter.py

  3. Copy the following script into the file:

import argparse
import json
import re
parser = argparse.ArgumentParser(description="Convert Apache logs to JSON")
parser.add_argument("-s", "--source", type=str, required=True)
parser.add_argument("-t", "--target", type=str, required=True)
regex = re.compile(
    '([^ ]*) ([^ ]*) ([^ ]*) \[([^]]*)\] "([^"]*)" ([^ ]*) ([^ ]*)'
    )
def parse_line(line):
    m = regex.match(line)
    if not m:
        return "{}"
    host, ignore, user, date, request, status, size = m.groups()
    method_items = list(map(lambda x: x.replace("\"", ""), request.split(" ")))
    if len(method_items) == 2:
        method_items.append("-")
    _dict = {"host":host, "user":user, "timestamp":date, "method":method_items[0], "path":method_items[1], "protocol": method_items[2], "code":status, "size":size}
    return json.dumps(_dict)
def translate_file(source, target):
    with open(source, 'r') as _in, open(target, 'w+') as _out:
        for line in _in:
            _out.write(parse_line(line))
            _out.write("\n")
if __name__ == '__main__':
    args = parser.parse_args()
    translate_file(args.source, args.target)
    print("Converted {} to {}".format(args.source, args.target))

4. Ensure that your Apache logs are in the same directory as the above script. 5. Run the following command: python3 apache_log_converter.py -s <SOURCE FILENAME> -t <TARGET FILENAME>

6. After you have generated a file that can be ingested by the Spherical Defense data ingestor, go back to the AMI quickstart below to continue the on-boarding process.

pageAmazon Machine Image

Combined Log Format (NGINX)

The combined log format is the same as the above, but with two extra fields included:

  • A referrer URL

  • A user-agent

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"

IIS Log Format

2010-05-02 22:48:39 206.175.82.5 - 208.201.133.173 GET /global/images/navlineboards.gif - 200 540 324 157 HTTP/1.0 Mozilla/4.0+(compatible;+MSIE+4.01;+Windows+95) USERID=CustomerA;+IMPID=01234 http://www.loganalyzer.net

W3C Extended Log Format

2010-05-02 15:42:15 - 40.89.255.10  34.14.255.10 80 GET /default.htm 200 - HTTP/1.0 Mozilla/4.0  (compatible: MSIE+5.5+Windows+2000+Server)

Last updated