Reputation: 1181
I am currently searching through a log file that contains IP addresses.
Log example:
10.1.177.198 Tue Jun 19 09:25:16 CDT 2018
10.1.160.198 Tue Jun 19 09:25:38 CDT 2018
10.1.177.198 Tue Jun 19 09:25:36 CDT 2018
10.1.160.198 Tue Jun 19 09:25:40 CDT 2018
10.1.177.198 Tue Jun 19 09:26:38 CDT 2018
10.1.177.198 Tue Jun 19 09:27:16 CDT 2018
10.1.177.198 Tue Jun 19 09:28:38 CDT 2018
I can currently grab the IP address from the last line of the log. I can also search for all line numbers that have the same IP address.
If the last IP address in the log is listed 3 or more times in the log, how can I get the line number for the 3rd to last occurrence of that IP address?
For example, I want to get the line number for this line:
10.1.177.198 Tue Jun 19 09:26:38 CDT 2018
Or better yet, just print the entire line.
Here is an example of my code:
import re
def run():
try:
logfile = open('read.log', 'r')
for line in logfile:
x1 = line.split()[0]
for num, line in enumerate(logfile, 0):
if x1 in line:
print("Found " + x1 + " at line:", num)
print ('Last Line: ' + x1)
logfile.close
except OSError as e:
print (e)
run()
I am listing all the line numbers where the specific IP address occurs.
print("Found " + x1 + " at line:", num)
I am wanting to print the line where "num" is the 3rd to last number in the list of line numbers.
My overall goal is to grab the IP address from the last line in the log file. Then check if it has previously been listed more than 3 times. If it has, I want to find the 3rd to last listing of the address and get the line number.(or just print the address and date listed on that line)
Upvotes: 0
Views: 90
Reputation: 1051
Track all the occurences and print the 3rd one from the last. Can be optimized by using heapq
.
def run():
try:
logfile = open('log.txt', 'r')
ip_address_line_number = dict()
for index,line in enumerate(logfile,1):
x1 = line.split()[0]
log_time = line.split()[4]
if x1 in ip_address_line_number :
ip_address_line_number[x1].append((index,log_time))
else:
ip_address_line_number[x1] = [(index,log_time)]
if x1 in ip_address_line_number and len(ip_address_line_number.get(x1,None)) > 2:
print('Last Line: '+ ip_address_line_number[x1][-3].__str__())
else:
print(x1 + ' has 0-2 occurences')
logfile.close
except OSError as e:
print (e)
run()
Upvotes: 1
Reputation: 10860
Using pandas
this would be quite short:
import pandas as pd
df = pd.read_fwf('read.log', colspecs=[(None, 12), (13, None)], header=None, names=['IP', 'time'])
lastIP = df.IP[df.index[-1]]
lastIP_idx = df.groupby('IP').groups[lastIP]
n = 3
if len(lastIP_idx) >= n:
print('\t'.join(list( df.loc[lastIP_idx[-n]] )))
else:
print('occurence number of ' + lastIP + ' < ' + str(n))
Upvotes: 0
Reputation: 44525
Another way to see this, if the file was read in reverse:
3+1
observations of the first ip.There are many tools that can offer even more simple code, but here is one flexible, general approach geared for memory efficiency. Roughly, let's:
3+1
observationsGiven
A file test.log
# test.log
10.1.177.198 Tue Jun 19 09:25:16 CDT 2018
10.1.160.198 Tue Jun 19 09:25:38 CDT 2018
10.1.177.198 Tue Jun 19 09:25:36 CDT 2018
10.1.160.198 Tue Jun 19 09:25:40 CDT 2018
10.1.177.198 Tue Jun 19 09:26:38 CDT 2018
10.1.177.198 Tue Jun 19 09:27:16 CDT 2018
10.1.177.198 Tue Jun 19 09:28:38 CDT 2018
and code for a reverse_readline()
generator, we can write the following:
Code
def run(filename, target=3, min_=3):
"""Return the line number and data of the `target`-last observation.
Parameters
----------
filename : str or Path
Filepath or name to file.
target : int
Number of final expected observations from the bottom,
e.g. "third to last observation."
min_ : int
Total observations must exceed this number.
"""
idx, prior, data = 0, "", []
for i, line in enumerate(reverse_readline(filename)):
ip, text = line.strip().split(maxsplit=1)
if i == 0:
target_ip = ip
if target == 0:
idx, *data = prior
if ip == target_ip:
target -= 1
prior = i, ip, text
# Edge case
total_obs = prior[0]
if total_obs < min_:
print(f"Minimum observations was not met. Got {total_obs} observations.")
return None
# Compute line number
line_num = (i - idx) + 1 # add 1 (zero-indexed)
return [line_num] + data
Demo
run("test.log")
# [5, '10.1.177.198', 'Tue Jun 19 09:26:38 CDT 2018']
Second to last observation:
run("test.log", 2)
# [6, '10.1.177.198', 'Tue Jun 19 09:27:16 CDT 2018']
Minimum required observations:
run("test.log", 2, 7)
# Minimum observations was not met. Got 6 observations.
Add error handling as needed.
Details
Note: an "observation" is a line containing the targeted ip.
reverse_readline()
generator.target_ip
is determined from the "first" line of the reversed file.prior
(reducing memory consumption).target
is a counter that is decremented after each observation. When the target
counter reaches 0
, the prior
observation is saved until the generator is exhausted. prior
is a tuple containing line data for the last observation of the target ip address, i.e. index, address and text.total_obs
ervations and length of the file, which is used to compute the line_num
ber.Upvotes: 0