elzwhere
elzwhere

Reputation: 899

Python log parsing for IP's

I'm new to python and have been going through some tutorials on log parsing with regular expressions. In the code below I am able to parse a log and create a file with remote IP's making a connection to the server. I'm missing the piece that will eliminate duplicate IP's in the out.txt file created. Thanks

import re
import sys

infile = open("/var/log/user.log","r")
outfile = open("/var/log/intruders.txt","w")

pattern = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"
regexp = re.compile(pattern, re.VERBOSE)

for line in infile:
  result = regexp.search(line)
  if result:
    outfile.write("%s\n" % (result.group()))

infile.close()
outfile.close()

Upvotes: 1

Views: 1427

Answers (1)

Raymond Hettinger
Raymond Hettinger

Reputation: 226171

You can save the results seen so far in a set() and then only write-out results that have not yet been seen. This logic is easy to add to your existing code:

import re
import sys

seen = set() 

infile = open("/var/log/user.log","r")
outfile = open("/var/log/intruders.txt","w")

pattern = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"
regexp = re.compile(pattern, re.VERBOSE)

for line in infile:
  mo = regexp.search(line)
  if mo is not None:
     ip_addr = mo.group()
     if ip_addr not in seen:
         seen.add(ip_addr)
         outfile.write("%s\n" % ip_addr)

infile.close()
outfile.close()

Upvotes: 5

Related Questions