Everaldo Aguiar
Everaldo Aguiar

Reputation: 4126

Finding the format being used on Apache log

I am attempting to perform some data analysis on a set of Apache access logs that were passed on to me, but I noticed these logs do not seem to be in a conventional format (based on a few other Apache log examples I found online). Following is one row extracted from one of my log files (after some anonymization):

2013-08-25 10:06:11 EDT - "GET http://www.siteaddress.com/section/aaa/z/directory HTTP/1.1" 404 1677 1.2.181.171 "-" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 98)" - 0 155311 -

Is there any way I can find out what format is being used to record these logs? That is, how would I go about getting some sort of a header for this file? Ps.: I have access to the server that is capturing these logs and could use it to find that information out.

Edit 1: I was told to check the content of /etc/apache2/httpd.conf, which I found to be empty.

Edit 2: The following relevant piece was found within apache2.conf but I'm not quite sure these match what I'm seeing in the logs.

# The following directives define some format nicknames for use with
# a CustomLog directive (see below).
# If you are behind a reverse proxy, you might want to change %h into %{X-Forwarded-For}i

LogFormat "%v:%p %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\""  vhost_combined
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %b" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent

Edit 3: Found this within /etc/apache2/sites-available/hub and it seems to match my format (Thanks a lot!)

LogFormat "%{%Y-%m-%d %H:%M:%S %Z}t %u \"%r\" %>s %B %a \"%{Referer}i\" \"%{User-Agent}i\" - %T %D -"

Upvotes: 0

Views: 3822

Answers (2)

Aaron Miller
Aaron Miller

Reputation: 3780

You can find the format specification in one of the Apache configuration files; depending on your server configuration, that may be the primary config file (/etc/apache2/apache2.conf, probably; that's the default on Debian and its derivates, while /etc/httpd/httpd.conf is common among the Rat Head family), or in the configuration file for the virtual host whose logs you're looking at (/etc/apache2/sites-enabled/* for Debian and company, God only knows where for Rat Head).

The configuration directive you're after will be either LogFormat, which aliases a format string to a short name, or CustomLog, which uses either a format string, or a short name defined earlier in a LogFormat directive, to specify an actual logging format.

The format string syntax is detailed in the Apache documentation.

To save you some time, from eyeballing the sample line and having had considerable experience of Apache logs, most of the format appears to be:

<datetime> - "<request method> <url> <HTTP version>" <response status> <response length> <client address> "<request Referer: header value?>" "<request User-Agent: header value>" - ?? <response duration in microseconds?> - <newline>

I don't know offhand what the parts involving question marks are, but the rest seem pretty obvious.

Upvotes: 1

Johannes H.
Johannes H.

Reputation: 6167

In debian based distros, the config is split into multiple files. Main configuration is /etc/apache2/apache2.conf, while all vHosts (that are used by default) are configured in /etc/apache2/sites-available (they might not be active though, when activating virtual hosts using a2ensite a symlink in /etc/apache2/sites-enabled/ is created, from where the files get included)

If you have never messed with the config, the log format should either be set in /etc/apache2/sites-available/default inside the VirtualHost container ot in apache2.conf

Upvotes: 1

Related Questions