crist_9
crist_9

Reputation: 45

txt to pandas df python

I am trying to extract the information from the following text file and convert it to a pandas dataframe, the text file looks like:

disp alarm active
1:Critical  2:Major  3:Minor  4:Warning
--------------------------------------------------------------------------------
Sequence   AlarmId    Level Date Time  Description
--------------------------------------------------------------------------------
142527     0x813007C  1     2022-10-20 Loss of signal alarm. (hwPhysicalPortIfIn
                             12:23:37+ dex=17043714,BoardID=4,CardID=0,PortID=1,
                            02:00 DST  InterfaceName=GigabitEthernet4/0/1,AlarmN
                                       ame=ETH_LOS)
142526     0x8520003  2     2022-10-20 The interface status changes. (ifName=Gig
                             12:23:29+ abitEthernet4/0/1.195, AdminStatus=UP, Op
                            02:00 DST  erStatus=DOWN, Reason=The main interface
                                       is down, mainIfname=GigabitEthernet4/0/1)

--------------------------------------------------------------------------------
<BX1273RA3>

The information starts from line 6, the column names are Sequence, AlarmId, Level, Date Time, Description, the biggest problem is data time that is in several lines. I have this code:


import pandas as pd

f = open("log.log", "r")
lineas = f.readlines()
f.close()

headers = lineas[3]
print(headers)

# delete lineas without information
indexes_to_be_removed = [0, 1, 2, 4, -2, -1]

for idx in sorted(indexes_to_be_removed, reverse = True):
    del lineas[idx]

# clear black spaces
lineas= [linea.strip() for linea in lineas]

concatened = ''
for linea in lineas:
    concatened = concatened + linea + '\n'

print(concatened)

Should I use regular expressions or is it not necessary in this case? Thanks for the help.

Upvotes: 0

Views: 54

Answers (1)

suvayu
suvayu

Reputation: 4654

from io import StringIO

import pandas as pd

buf = StringIO()

with open("log.log") as f:
    for i, line in enumerate(f.readlines()):
        if (i in [0, 1, 2, 4, 14, 15]) or (line.strip() == ""):
            continue
        buf.write(
            "|".join([line[:10], line[10:20], line[20:27], line[27:38], line[38:]])
        )

buf.seek(0)
df = pd.read_table(buf, sep="|", header=0)
buf.close()

The above should give you a dataframe like this:

>>> df
   Sequence     AlarmId      Level   Date Time                                  Description
0  142527       0x813007C    1       2022-10-20   Loss of signal alarm. (hwPhysicalPortIfIn
1                                     12:23:37+   dex=17043714,BoardID=4,CardID=0,PortID=1,
2                                    02:00 DST    InterfaceName=GigabitEthernet4/0/1,AlarmN
3                                                                              ame=ETH_LOS)
4  142526       0x8520003    2       2022-10-20   The interface status changes. (ifName=Gig
5                                     12:23:29+   abitEthernet4/0/1.195, AdminStatus=UP, Op
6                                    02:00 DST     erStatus=DOWN, Reason=The main interface
7                                                 is down, mainIfname=GigabitEthernet4/0/1)
>>>

That's a reasonable starting point for further processing.

E.g you can identify the different blocks of rows like this:

>>> df.columns = df.columns.str.strip()  # cleanup column names
>>> df["Sequence"].str.strip() == ''
0    False
1     True
2     True
3     True
4    False
5     True
6     True
7     True

Upvotes: 2

Related Questions