Reputation: 183
I have a .dat
file made by an FPGA. The file contains 3 columns: the first is the input channel (it can be 1 or 2), the second column is the timestamp at which an event occurred, the third is the local time at which the same event occurred. The third column is necessary because sometimes the FPGA has to reset the clock counter in such a way that it doesn't count in a continuous way. An example of what I am saying is represented in the next figure.
An example of some lines from the .dat
file is the following:
1 80.80051152 2022-02-24T18:28:49.602000
2 80.91821978 2022-02-24T18:28:49.716000
1 80.94284154 2022-02-24T18:28:49.732000
2 0.01856876 2022-02-24T18:29:15.068000
2 0.04225772 2022-02-24T18:29:15.100000
2 0.11766780 2022-02-24T18:29:15.178000
The time column is given by the FPGA (in tens of nanosecond), the date column is written by the python script that listen the data from the FPGA, when it has to write a timestamp it saves also the local time as a date.
I am interested in getting two arrays (one for each channel) where I have for each event the time at which that event occurs relatively to the starting time of the acquisition. An example of how the data given before should look at the end is the following:
8.091821978000000115e+01
1.062702197800000050e+02
1.062939087400000062e+02
1.063693188200000179e+02
These data refere to the second channel only. Double check can be made by observing third column in the previous data.
I tried to achieve this whit a function (too messy to me) where I check every time if the difference between two consecutive events in time is greater than 1 second respect to the difference in local time, if that's the case I evaluate the time interval through the local time column. So I correct the timestamp by the right amount of time:
ch, time, date = np.genfromtxt("events220302_1d.dat", unpack=True,
dtype=(int, float, 'datetime64[ms]'))
mask1 = ch==1
mask2 = ch==2
time1 = time[mask1]
time2 = time[mask2]
date1 = date[mask1]
date2 = date[mask2]
corr1 = np.zeros(len(time1))
for idx, val in enumerate(time1):
if idx < len(time1) - 1:
if check_dif(time1[idx], time1[idx+1], date1[idx], date1[idx+1]) == 0:
corr1[idx+1] = val + (date1[idx+1]-date1[idx])/np.timedelta64(1,'s') - time1[idx+1]
time1 = time1 + corr1.cumsum()
Where check_dif
is a function that returns 0 if the difference in time between consecutive events is inconsistent with the difference in date between the two same events as I said before.
Is there any more elegant or even faster way to get what I want with maybe some fancy NumPy coding?
Upvotes: 1
Views: 95
Reputation: 31
A simple initial way to optimize your code is to make the code if-less, thus getting rid of both the if statements. To do so, instead of returning 0
in check_dif
, you can return 1
when "the difference in time between consecutive events is inconsistent with the difference in date between the two same events as I said before", otherwise 0
.
Your for loop will be something like that:
for idx in range(len(time1) - 1):
is_dif = check_dif(time1[idx], time1[idx+1], date1[idx], date1[idx+1])
# Correction value: if is_dif == 0, no correction; otherwise a correction takes place
correction = is_dif * (date1[idx+1]-date1[idx])/np.timedelta64(1,'s') - time1[idx+1]
corr1[idx+1] = time1[idx] + correction
A more numpy way to do things could be through vectorization. I don't know if you have some benchmark on the speed or how big the file is, but I think in your case the previous change should be good enough
Upvotes: 3