Reputation: 920
I have some data in the following format:
id 1234 5678 9876
date
2017-10-22 11:12:21.926000000 NaN NaN NaN
2017-10-22 11:16:34.589000000 NaN 5 NaN
2017-10-22 11:20:45.192000000 NaN NaN 10
2017-10-22 11:22:22.361000000 12 NaN NaN
2017-10-22 11:25:25.034000000 NaN 8 NaN
I want to clean this data into proper minute by minute time series and carry forwad the values for every minute till a new value is found for the ID in the time, "for the needed time range".
Output:
id 1234 5678 9876
date
2017-10-22 11:12:00 NaN NaN NaN
2017-10-22 11:13:00 NaN NaN NaN
2017-10-22 11:14:00 NaN NaN NaN
2017-10-22 11:15:00 NaN NaN NaN
2017-10-22 11:16:00 NaN 5 NaN
2017-10-22 11:17:00 NaN 5 NaN
2017-10-22 11:18:00 NaN 5 NaN
2017-10-22 11:19:00 NaN 5 NaN
2017-10-22 11:20:00 NaN 5 10
2017-10-22 11:21:00 NaN 5 10
2017-10-22 11:22:00 12 5 10
2017-10-22 11:23:00 12 5 10
2017-10-22 11:24:00 12 5 10
2017-10-22 11:25:00 12 8 10
2017-10-22 11:26:00 12 8 10
I have been trying to create new dataframe and merge it with the old one but not with much luck. Thanks for any suggestions.
Upvotes: 0
Views: 47
Reputation: 476
Assuming 'd' is the dataframe you show as your starting point then:
d.date = pd.to_datetime(d.date).values.astype('<M8[m]')
d = d.set_index('date')
d.resample('1Min').ffill().fillna(method='ffill')
You have to 'chop off' the seconds and fractions of seconds by calling .values.astype('<M8[m]')
because ffill
rounds up even for a few seconds. Also the two ffill
calls seem weird but if you try with just one you will see that things are filled in only partially until another non-NaN value shows up in any column.
And also make sure 'date' is set as the index.
Upvotes: 1