Comment on page
Using Access Logs
This page describes how to convert your existing access logs into a format that Spherical Defense can ingest and learn from.
Web server logging rarely includes HTTP POST payloads, which makes this approach unsuitable if you expect inbound HTTP payloads in your traffic.
The Common Log Format is the standard Apache logging format for web server access logging. This format contains the following fields:
- 1.The IP address of the client (remote host) which made the request to the server.
- 3.The UserId of the person requesting the document. Usually "-" unless .htaccess has requested authentication.
- 4.The date, time, and time zone that the request was received, by default in strftime format %d/%b/%Y:%H:%M:%S %z.
- 5.The request line from the client. The method GET, /apache_pb.gif the resource requested, and HTTP/1.0 the HTTP protocol.
- 6.The HTTP status code returned to the client. 2xx is a successful response, 3xx a redirection, 4xx a client error, and 5xx a server error.
- 7.
127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
To convert this file into a format that can be ingested by a running Spherical Defense instance, complete the following steps:
- 1.SSH into your Spherical Defense instance.
- 2.Create a new file in the ~/ directory called
apache_log_converter.py
- 3.Copy the following script into the file:
import argparse
import json
import re
parser = argparse.ArgumentParser(description="Convert Apache logs to JSON")
parser.add_argument("-s", "--source", type=str, required=True)
parser.add_argument("-t", "--target", type=str, required=True)
regex = re.compile(
'([^ ]*) ([^ ]*) ([^ ]*) \[([^]]*)\] "([^"]*)" ([^ ]*) ([^ ]*)'
)
def parse_line(line):
m = regex.match(line)
if not m:
return "{}"
host, ignore, user, date, request, status, size = m.groups()
method_items = list(map(lambda x: x.replace("\"", ""), request.split(" ")))
if len(method_items) == 2:
method_items.append("-")
_dict = {"host":host, "user":user, "timestamp":date, "method":method_items[0], "path":method_items[1], "protocol": method_items[2], "code":status, "size":size}
return json.dumps(_dict)
def translate_file(source, target):
with open(source, 'r') as _in, open(target, 'w+') as _out:
for line in _in:
_out.write(parse_line(line))
_out.write("\n")
if __name__ == '__main__':
args = parser.parse_args()
translate_file(args.source, args.target)
print("Converted {} to {}".format(args.source, args.target))
4. Ensure that your Apache logs are in the same directory as the above script.
5. Run the following command:
python3 apache_log_converter.py -s <SOURCE FILENAME> -t <TARGET FILENAME>
6. After you have generated a file that can be ingested by the Spherical Defense data ingestor, go back to the AMI quickstart below to continue the on-boarding process.
The combined log format is the same as the above, but with two extra fields included:
- A referrer URL
- A user-agent
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
2010-05-02 22:48:39 206.175.82.5 - 208.201.133.173 GET /global/images/navlineboards.gif - 200 540 324 157 HTTP/1.0 Mozilla/4.0+(compatible;+MSIE+4.01;+Windows+95) USERID=CustomerA;+IMPID=01234 http://www.loganalyzer.net
2010-05-02 15:42:15 - 40.89.255.10 34.14.255.10 80 GET /default.htm 200 - HTTP/1.0 Mozilla/4.0 (compatible: MSIE+5.5+Windows+2000+Server)
Last modified 4yr ago