Reputation: 273
I've been getting the hang of python for the past week, and finally got something working, but could use some help speeding it up
The function takes a .CSV log off a vehicle CAN Bus, and reduces it down to a list of records that match a set of message ID's, and a set of what Message ID's where encountered.
The files are 500,000 lines to 50,000,000 lines. currently it takes about 3.2uS per line in my laptop.
The CSV file lines look like:
Time [s],Packet,Type,Identifier,Control,Data,CRC,ACK
0.210436250000000,0,DATA,0x0CFAE621,0x8,0x02 0x50 0x00 0x00 0x04 0x01 0x00 0x29,0x19A8,NAK
...
...
52.936353750000002,15810,DATA,0x18FC07F4,0x8,0xF0 0x09 0x00 0x00 0xCE 0x03 0x92 0x20,0x0C47,ACK
So the 4th entry "0x0CFAE621" is the Message ID, and the 6th entry "0xF0 0x09 0x00 0x00 0xCE 0x03 0x92 0x20" is the data
This is masked with 0x00FFFF00 and if matched saved as [0xFAE600,'F0','09','00','00','CE','03','92','20'], Though Ideally I would like to convert all that data to int at this point, it seemed very slow wrapping each with int() (was thinking I might be able to improve it with a dict to do the Hex-Int Conversion, but I was not sure on how to do that)
The len() and If tree is because the message data can be empty to 8 records, again I feel there is probably a better way to accomplish this.
from tkinter import filedialog
from tkinter import Tk
import timeit
Tk().withdraw()
filename = filedialog.askopenfile(title="Select .csv log file", filetypes=(("CSV files", "*.csv"), ("all files", "*.*")))
if not filename:
print("No File Selected")
else:
CanIdentifiers = set()
CanRecordData = []
IdentifierList = {0x00F00100,0x00F00400,0x00FC0800,0x00FE4000,0x00FE4E00,0x00FE5A00,0x00FE6E00,0x00FEC100,0x00FEC300,0x00FECA00,0x00FEF100}
mask = 0x00FFFF00
loopcount = 0
error = 0
csvtype = 0
start_time = timeit.default_timer()
for line in filename.readlines():
message = line.split(',')
if csvtype == 1:
if message[2] == "DATA":
messageidentifier = int(message[3], 16) & mask
if messageidentifier not in CanIdentifiers:
CanIdentifiers.add(messageidentifier)
if messageidentifier in IdentifierList:
messagedata = message[5].split("0x")
size1 = len(messagedata)
if size1 == 2:
CanRecordData.append((messageidentifier, messagedata[1]))
if size1 == 3:
CanRecordData.append((messageidentifier, messagedata[1], messagedata[2]))
if size1 == 4:
CanRecordData.append((messageidentifier, messagedata[1], messagedata[2], messagedata[3]))
if size1 == 5:
CanRecordData.append((messageidentifier, messagedata[1], messagedata[2], messagedata[3], messagedata[4]))
if size1 == 6:
CanRecordData.append((messageidentifier, messagedata[1], messagedata[2], messagedata[3], messagedata[4], messagedata[5]))
if size1 == 7:
CanRecordData.append((messageidentifier, messagedata[1], messagedata[2], messagedata[3], messagedata[4], messagedata[5], messagedata[6]))
if size1 == 8:
CanRecordData.append((messageidentifier, messagedata[1], messagedata[2], messagedata[3], messagedata[4], messagedata[5], messagedata[6], messagedata[7]))
if size1 == 9:
CanRecordData.append((messageidentifier, messagedata[1], messagedata[2], messagedata[3], messagedata[4], messagedata[5], messagedata[6], messagedata[7], messagedata[8]))
if csvtype == 0:
if message[0] == "Time [s]":
csvtype = 1
error += 1
if error == 50:
break
loopcount += 1
readtime = (timeit.default_timer() - start_time) * 1000000
print(loopcount, "Records Processed at", readtime/loopcount, "uS per Record")
Upvotes: 2
Views: 948
Reputation: 249093
Pandas' read_csv()
will give you a DataFrame:
Time [s] Packet Type Identifier Control Data CRC ACK
0 0.210436 0 DATA 0x0CFAE621 0x8 0x02 0x50 0x00 0x00 0x04 0x01 0x00 0x29 0x19A8 NAK
1 52.936354 15810 DATA 0x18FC07F4 0x8 0xF0 0x09 0x00 0x00 0xCE 0x03 0x92 0x20 0x0C47 ACK
Then, split the data bytes if you want:
import pandas as pd
df = pd.read_csv('t.csv')
df.Data.str.split(expand=True)
Which gives you:
0 1 2 3 4 5 6 7
0 0x02 0x50 0x00 0x00 0x04 0x01 0x00 0x29
1 0xF0 0x09 0x00 0x00 0xCE 0x03 0x92 0x20
This will be much, much faster than Python loops, and the storage will be more compact as well--especially if you parse the hex numbers into actual integers: convert pandas dataframe column from hex string to int
Upvotes: 3