Reputation: 50
Recently faced a problem in analyzing log.
"Single line log file of 10GB size needs to be read and all IP address must be printed"
Issue: Cannot read line by line to avoid memory corruption. Have to go for character by character.
Solution:
#!/usr/bin/python
import re
def getIP():
ip = re.compile('\d+|\\.')
out = []
with open("./ipaddr","r") as f:
while True:
c = f.read(1)
if not c:
break
if ip.match(c):
out.append(c)
for i in range(14):
c = f.read(1)
if ip.match(c):
out.append(c)
else:
if out:
yield "".join(out)
out = []
print str([ipad for ipad in getIP()])
Any ideas to simplify ??
Upvotes: 0
Views: 307
Reputation: 7886
This should do it:
import re
from functools import partial
def getIP(file_name):
ip_regex = re.compile("(?:\d{1,3}\.){3}\d{1,3}")
current = ""
with open(file_name) as file:
for c in iter(partial(file.read, 1), ""):
current += c
current = current[-15:]
m = ip_regex.match(current)
if m:
yield m.group()
current = current[m.endpos:]
Upvotes: 1