Parsing Apache2 access logs with the OpenTelemetry Collector
I couldn't find a ton of resources on this, but FYI -- the OpenTelemetry Collector's filelog
receiver has a pretty robust regex parser built into it. Want to get your access.log files from Apache? Here's the config.
filelog/access:
include: [ /var/log/apache2/access.log ]
operators:
- type: regex_parser
regex: '(?P<ip>\d{1,3}(?:\.\d{1,3}){3}) - - \[(?P<datetime>[^\]]+)] "(?P<method>\S+) (?P<path>\S+) (?P<protocol>\S+)" (?P<status>\d{3}) (?P<size>\d+) "(?P<referrer>[^"]*)" "(?P<user_agent>[^"]*)'
timestamp:
parse_from: attributes["datetime"]
layout: '%d/%b/%Y:%H:%M:%S %z'
severity:
parse_from: attributes["status"]
The documentation for a lot of this stuff is stuck inside the GitHub repositories for the receiver modules, so be sure to check that out if you're looking for a quick reference.
What if we want to go further and turn our attributes into their appropriate semantic conventions? While there's no explicit log conventions for HTTP servers, the Span ones should work for our purposes.
transform:
error_mode: ignore
log_statements:
- context: log
statements:
- replace_all_patterns(attributes, "key", "method", "http.request.method")
- replace_all_patterns(attributes, "key", "status", "http.response.status_code")
- replace_all_patterns(attributes, "key", "user_agent", "user_agent.original")
- replace_all_patterns(attributes, "key", "ip", "client.address")
- replace_all_patterns(attributes, "key", "path", "url.path")
- delete_key(attributes, "datetime")
- delete_key(attributes, "size")
This should be enough to get started, at least, although there's more you might want to do:
- Add resource attributes for the logical service name (apache, reverse-proxy, etc.)
- Change up your Apache log format to get more information like the scheme, or time spent serving the request.
neat, those likes show up! I need to putz with it to get them to show up as likes tho
The regex looks decent. I’m also realizing the article on log parsing I wrote back in 2004 has been deleted from the O’Reilly website.
I need to tweak the output a bit to get it to align with otel semantic conventions, I was just surprised I couldn’t find a premade example…