Reputation: 45
I am trying to extract the information from the following text file and convert it to a pandas dataframe, the text file looks like:
disp alarm active
1:Critical 2:Major 3:Minor 4:Warning
--------------------------------------------------------------------------------
Sequence AlarmId Level Date Time Description
--------------------------------------------------------------------------------
142527 0x813007C 1 2022-10-20 Loss of signal alarm. (hwPhysicalPortIfIn
12:23:37+ dex=17043714,BoardID=4,CardID=0,PortID=1,
02:00 DST InterfaceName=GigabitEthernet4/0/1,AlarmN
ame=ETH_LOS)
142526 0x8520003 2 2022-10-20 The interface status changes. (ifName=Gig
12:23:29+ abitEthernet4/0/1.195, AdminStatus=UP, Op
02:00 DST erStatus=DOWN, Reason=The main interface
is down, mainIfname=GigabitEthernet4/0/1)
--------------------------------------------------------------------------------
<BX1273RA3>
The information starts from line 6, the column names are Sequence, AlarmId, Level, Date Time, Description, the biggest problem is data time that is in several lines. I have this code:
import pandas as pd
f = open("log.log", "r")
lineas = f.readlines()
f.close()
headers = lineas[3]
print(headers)
# delete lineas without information
indexes_to_be_removed = [0, 1, 2, 4, -2, -1]
for idx in sorted(indexes_to_be_removed, reverse = True):
del lineas[idx]
# clear black spaces
lineas= [linea.strip() for linea in lineas]
concatened = ''
for linea in lineas:
concatened = concatened + linea + '\n'
print(concatened)
Should I use regular expressions or is it not necessary in this case? Thanks for the help.
Upvotes: 0
Views: 54
Reputation: 4654
from io import StringIO
import pandas as pd
buf = StringIO()
with open("log.log") as f:
for i, line in enumerate(f.readlines()):
if (i in [0, 1, 2, 4, 14, 15]) or (line.strip() == ""):
continue
buf.write(
"|".join([line[:10], line[10:20], line[20:27], line[27:38], line[38:]])
)
buf.seek(0)
df = pd.read_table(buf, sep="|", header=0)
buf.close()
The above should give you a dataframe like this:
>>> df
Sequence AlarmId Level Date Time Description
0 142527 0x813007C 1 2022-10-20 Loss of signal alarm. (hwPhysicalPortIfIn
1 12:23:37+ dex=17043714,BoardID=4,CardID=0,PortID=1,
2 02:00 DST InterfaceName=GigabitEthernet4/0/1,AlarmN
3 ame=ETH_LOS)
4 142526 0x8520003 2 2022-10-20 The interface status changes. (ifName=Gig
5 12:23:29+ abitEthernet4/0/1.195, AdminStatus=UP, Op
6 02:00 DST erStatus=DOWN, Reason=The main interface
7 is down, mainIfname=GigabitEthernet4/0/1)
>>>
That's a reasonable starting point for further processing.
E.g you can identify the different blocks of rows like this:
>>> df.columns = df.columns.str.strip() # cleanup column names
>>> df["Sequence"].str.strip() == ''
0 False
1 True
2 True
3 True
4 False
5 True
6 True
7 True
Upvotes: 2