Dmitry Dobberen
Dmitry Dobberen

Reputation: 25

Analyze logs with Python

I have a csv file with logs. I need to analyze it and select the necessary information from the file. The problem is that it has a lot of tables with headers. They don't have names. Tables are separated by empty rows and are also separated from each other. Let's say I need to select all data from the %idle column, where CPU = all

Structure:

09:20:06,CPU,%usr,%nice,%sys,%iowait,%steal,%irq,%soft,%guest,%idle
09:21:06,all,4.98,0.00,5.10,0.00,0.00,0.00,0.06,0.00,89.86
09:21:06,0,12.88,0.00,5.62,0.03,0.00,0.02,1.27,0.00,80.18

12:08:06,CPU,%usr,%nice,%sys,%iowait,%steal,%irq,%soft,%guest,%idle
12:09:06,all,5.48,0.00,5.24,0.00,0.00,0.00,0.12,0.00,89.15
12:09:06,0,18.57,0.00,5.35,0.02,0.00,0.00,3.00,0.00,73.06

09:20:06,runq-sz,plist-sz,ldavg-1,ldavg-5,ldavg-15
09:21:06,3,1444,2.01,2.12,2.15
09:22:06,4,1444,2.15,2.14,2.15

Upvotes: 0

Views: 241

Answers (2)

Liju
Liju

Reputation: 2303

You can use below program to parse this csv.

result={}
with open("log.csv","r") as f:
    for table in f.read().split("\n\n"):
        rows=table.split("\n")
        header=rows[0]
        for row in rows[1:]:
            for i,j in zip(header.split(",")[1:],row.split(",")[1:]):
                if i in result:
                    result[i].append(j)
                else:
                    result[i]=[j]
print(result["%idle"])

Output (values of %idle)

['89.86', '80.18', '89.15', '73.06']

This assumes the table column and row values are in same order and no two tables have common column name.

Upvotes: 1

One rather dumb solution would be to use an "ordinary" file reader for the original CSV. You can read everything up to a new line break as a single CSV and then parse the text you just read in memory.

Every time you "see" a line break, you know to treat it as an entirely new CSV, so you can repeat the above procedure for it.

For example, you would have one string that contained:

09:20:06,CPU,%usr,%nice,%sys,%iowait,%steal,%irq,%soft,%guest,%idle
09:21:06,all,4.98,0.00,5.10,0.00,0.00,0.00,0.06,0.00,89.86
09:21:06,0,12.88,0.00,5.62,0.03,0.00,0.02,1.27,0.00,80.18

and then parse it in memory. Once you get to the line break after that, you would know that you needed a new string containing the following:

12:08:06,CPU,%usr,%nice,%sys,%iowait,%steal,%irq,%soft,%guest,%idle
12:09:06,all,5.48,0.00,5.24,0.00,0.00,0.00,0.12,0.00,89.15
12:09:06,0,18.57,0.00,5.35,0.02,0.00,0.00,3.00,0.00,73.06

etc. - you can just keep going like this for as many tables as you have.

Upvotes: 1

Related Questions