Reputation: 899
I'm new to python and have been going through some tutorials on log parsing with regular expressions. In the code below I am able to parse a log and create a file with remote IP's making a connection to the server. I'm missing the piece that will eliminate duplicate IP's in the out.txt file created. Thanks
import re
import sys
infile = open("/var/log/user.log","r")
outfile = open("/var/log/intruders.txt","w")
pattern = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"
regexp = re.compile(pattern, re.VERBOSE)
for line in infile:
result = regexp.search(line)
if result:
outfile.write("%s\n" % (result.group()))
infile.close()
outfile.close()
Upvotes: 1
Views: 1427
Reputation: 226171
You can save the results seen so far in a set() and then only write-out results that have not yet been seen. This logic is easy to add to your existing code:
import re
import sys
seen = set()
infile = open("/var/log/user.log","r")
outfile = open("/var/log/intruders.txt","w")
pattern = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"
regexp = re.compile(pattern, re.VERBOSE)
for line in infile:
mo = regexp.search(line)
if mo is not None:
ip_addr = mo.group()
if ip_addr not in seen:
seen.add(ip_addr)
outfile.write("%s\n" % ip_addr)
infile.close()
outfile.close()
Upvotes: 5