Anand
Anand

Reputation: 1387

How to parse IP addresses from Apache Server Log?

I have to find the commonly occuring IP addresses from apache logs.

12.1.12.1 9000 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"

12.1.12.1 9000 192.145.1.23 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"

How do I extract the IP addresses (i.e. 3rd word in each line) using regular expressions in Java? Also i have to find most common IP Addresses from it, for finding out robotic access. The log contains millions of lines, so regexp may be suitable for this.

Upvotes: 1

Views: 4448

Answers (4)

Lucas Zamboulis
Lucas Zamboulis

Reputation: 2551

As others have pointed out, you don't need regexes. You shouldn't use String.split either, since it uses regexes as well. You could use StringTokenizer instead. Assuming you use BufferedReader br to read in each line:

String line = br.readLine();
StringTokenizer st = new StringTokenizer(line, " ");
st.nextToken();
st.nextToken();
String ip = st.nextToken();

Upvotes: 3

Costis Aivalis
Costis Aivalis

Reputation: 13728

The format of the access log file always depends on the configuration file settings. It would be probably better instead of assuming that the IP-address is the third 'word', to read the current configuration file and parse the access log file accordingly to the LogFormat entry.

Apache httpd operates in accordance to httpd.conf and Tomcat to server.xml. server.xml is an XML file and that makes parsing the AccessLogValve a standard procedure.

This is a little more work, but it will make your application more flexible, in case it is necessary to persist. For this approach, i think, string methods will be easier to use than regular expressions.

Upvotes: 0

aioobe
aioobe

Reputation: 421280

Here is one solution:

String str1 = "12.1.12.1 9000 127.0.0.1 - frank [10/Oct/2000:13:55:36"
            + " -0700] \"GET /apache_pb.gif HTTP/1.0\" 200 2326 "
            + "\"http://www.example.com/start.html\" \"Mozilla/4.08 "
            + "[en] (Win98; I ;Nav)\"";

String str2 = "12.1.12.1 9000 192.145.1.23 - frank [10/Oct/2000:13:55"
            + ":36 -0700] \"GET /apache_pb.gif HTTP/1.0\" 200 2326 "
            + "\"http://www.example.com/start.html\" \"Mozilla/4.08 "
            + "[en] (Win98; I ;Nav)\"";

Pattern p = Pattern.compile("\\S+\\s+\\S+\\s+(\\S+).*");

Matcher m = p.matcher(str1);
if (m.matches())
    System.out.println(m.group(1));

m = p.matcher(str2);
if (m.matches())
    System.out.println(m.group(1));

Reg-ex breakdown:

  • \S+, one or more non-white space characters.
  • \s+, one or more white space characters.
  • ...
  • (\\S+) one or more non-white space characters, captured in group 1.

Upvotes: 1

chahuistle
chahuistle

Reputation: 2645

If you are certain that it is always the 3rd word (as you said), maybe you don't need regular expressions at all. You could just take the third word via a simple split.

However, someone asked already that: Regular expression to match DNS hostname or IP Address?...

Upvotes: 3

Related Questions