Reputation: 25
I have a csv file with logs. I need to analyze it and select the necessary information from the file. The problem is that it has a lot of tables with headers. They don't have names. Tables are separated by empty rows and are also separated from each other. Let's say I need to select all data from the %idle column, where CPU = all
Structure:
09:20:06,CPU,%usr,%nice,%sys,%iowait,%steal,%irq,%soft,%guest,%idle
09:21:06,all,4.98,0.00,5.10,0.00,0.00,0.00,0.06,0.00,89.86
09:21:06,0,12.88,0.00,5.62,0.03,0.00,0.02,1.27,0.00,80.18
12:08:06,CPU,%usr,%nice,%sys,%iowait,%steal,%irq,%soft,%guest,%idle
12:09:06,all,5.48,0.00,5.24,0.00,0.00,0.00,0.12,0.00,89.15
12:09:06,0,18.57,0.00,5.35,0.02,0.00,0.00,3.00,0.00,73.06
09:20:06,runq-sz,plist-sz,ldavg-1,ldavg-5,ldavg-15
09:21:06,3,1444,2.01,2.12,2.15
09:22:06,4,1444,2.15,2.14,2.15
Upvotes: 0
Views: 241
Reputation: 2303
You can use below program to parse this csv.
result={}
with open("log.csv","r") as f:
for table in f.read().split("\n\n"):
rows=table.split("\n")
header=rows[0]
for row in rows[1:]:
for i,j in zip(header.split(",")[1:],row.split(",")[1:]):
if i in result:
result[i].append(j)
else:
result[i]=[j]
print(result["%idle"])
Output (values of %idle)
['89.86', '80.18', '89.15', '73.06']
This assumes the table column and row values are in same order and no two tables have common column name.
Upvotes: 1
Reputation: 12181
One rather dumb solution would be to use an "ordinary" file reader for the original CSV. You can read everything up to a new line break as a single CSV and then parse the text you just read in memory.
Every time you "see" a line break, you know to treat it as an entirely new CSV, so you can repeat the above procedure for it.
For example, you would have one string that contained:
09:20:06,CPU,%usr,%nice,%sys,%iowait,%steal,%irq,%soft,%guest,%idle
09:21:06,all,4.98,0.00,5.10,0.00,0.00,0.00,0.06,0.00,89.86
09:21:06,0,12.88,0.00,5.62,0.03,0.00,0.02,1.27,0.00,80.18
and then parse it in memory. Once you get to the line break after that, you would know that you needed a new string containing the following:
12:08:06,CPU,%usr,%nice,%sys,%iowait,%steal,%irq,%soft,%guest,%idle
12:09:06,all,5.48,0.00,5.24,0.00,0.00,0.00,0.12,0.00,89.15
12:09:06,0,18.57,0.00,5.35,0.02,0.00,0.00,3.00,0.00,73.06
etc. - you can just keep going like this for as many tables as you have.
Upvotes: 1