Reputation: 967
I have some timeseries data where the IDs of rows with type 'RX' or 'CRC' have unfortunately wrapped around - for simplicity in this example, it wraps around after reaching '1'. This means I end up with multiple 0 and 1 ids when they should match type 'TX' from 0 - 4. 'TX' is always correct.
What I want to do is group by a 10ms time window from each TX row, and fix the id column based on the id value in the TX row. i.e...
From:
time type id
0 2020-01-01 10:33:00.000 TX 0
1 2020-01-01 10:34:00.500 TX 1
2 2020-01-01 10:34:00.000 TX 2
3 2020-01-01 10:34:00.007 CRC 2
4 2020-01-01 10:34:00.009 RX 2
5 2020-01-01 10:34:00.027 RX 2
6 2020-01-01 10:34:00.047 RX 2
7 2020-01-01 10:34:00.012 TX 3
8 2020-01-01 10:34:00.013 RX 3
9 2020-01-01 10:34:00.038 RX 3
10 2020-01-01 10:34:00.026 TX 4
11 2020-01-01 10:34:00.036 TX 5
12 2020-01-01 10:34:00.041 TX 6
To:
time type id
0 2020-01-01 10:33:00.000 TX 0
1 2020-01-01 10:34:00.500 TX 1
2 2020-01-01 10:34:00.000 TX 2
3 2020-01-01 10:34:00.007 CRC 2
4 2020-01-01 10:34:00.009 RX 2
5 2020-01-01 10:34:00.012 TX 3
6 2020-01-01 10:34:00.013 RX 3
7 2020-01-01 10:34:00.026 TX 4
8 2020-01-01 10:34:00.027 RX 4
9 2020-01-01 10:34:00.036 TX 5
10 2020-01-01 10:34:00.038 RX 5
11 2020-01-01 10:34:00.041 TX 6
12 2020-01-01 10:34:00.047 RX 6
MWE:
s1 = pd.to_datetime(['20200101 10:33:00.000',
'20200101 10:34:00.500',
'20200101 10:34:00.000', '20200101 10:34:00.007', '20200101 10:34:00.009', '20200101 10:34:00.027', '20200101 10:34:00.047',
'20200101 10:34:00.012', '20200101 10:34:00.013', '20200101 10:34:00.038',
'20200101 10:34:00.026',
'20200101 10:34:00.036',
'20200101 10:34:00.041'])
d1 = {'time': s1,
'type': ['TX', 'TX', 'TX', 'CRC', 'RX', 'RX', 'RX', 'TX', 'RX', 'RX', 'TX', 'TX', 'TX'],
'id': [0, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 5, 6]}
df1 = pd.DataFrame(data=d1)
print(df1)
s2 = pd.to_datetime(['20200101 10:33:00.000',
'20200101 10:34:00.500',
'20200101 10:34:00.000', '20200101 10:34:00.007', '20200101 10:34:00.009',
'20200101 10:34:00.012', '20200101 10:34:00.013',
'20200101 10:34:00.026', '20200101 10:34:00.027',
'20200101 10:34:00.036', '20200101 10:34:00.038',
'20200101 10:34:00.041', '20200101 10:34:00.047'])
d2 = {'time': s2,
'type': ['TX', 'TX', 'TX', 'CRC', 'RX', 'TX', 'RX', 'TX', 'RX', 'TX', 'RX', 'TX', 'RX'],
'id': [0, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6]}
df2 = pd.DataFrame(data=d2)
print(df2)
Upvotes: 1
Views: 63
Reputation: 11391
This looks like it would solve your problem:
df1 = df1.sort_values('time')
df1['id'] = (df1['type'] == 'TX').cumsum() - 1
Output:
time type id
0 2020-01-01 10:34:00.000 TX 0
1 2020-01-01 10:34:00.007 CRC 0
2 2020-01-01 10:34:00.009 RX 0
5 2020-01-01 10:34:00.012 TX 1
6 2020-01-01 10:34:00.013 RX 1
8 2020-01-01 10:34:00.026 TX 2
3 2020-01-01 10:34:00.027 RX 2
9 2020-01-01 10:34:00.036 TX 3
7 2020-01-01 10:34:00.038 RX 3
10 2020-01-01 10:34:00.041 TX 4
4 2020-01-01 10:34:00.047 RX 4
Upvotes: 1