Vivek
Vivek

Reputation: 920

Pandas timeseries manipulation

I have some data in the following format:

id  1234    5678    9876    
date                                                                                    
2017-10-22 11:12:21.926000000   NaN     NaN     NaN     
2017-10-22 11:16:34.589000000   NaN     5       NaN
2017-10-22 11:20:45.192000000   NaN     NaN     10 
2017-10-22 11:22:22.361000000   12      NaN     NaN 
2017-10-22 11:25:25.034000000   NaN     8       NaN 

I want to clean this data into proper minute by minute time series and carry forwad the values for every minute till a new value is found for the ID in the time, "for the needed time range".

Output:

id  1234    5678    9876    
date                                                                                    
2017-10-22 11:12:00     NaN     NaN     NaN     
2017-10-22 11:13:00     NaN     NaN     NaN     
2017-10-22 11:14:00     NaN     NaN     NaN     
2017-10-22 11:15:00     NaN     NaN     NaN     
2017-10-22 11:16:00     NaN     5       NaN
2017-10-22 11:17:00     NaN     5       NaN
2017-10-22 11:18:00     NaN     5       NaN
2017-10-22 11:19:00     NaN     5       NaN
2017-10-22 11:20:00     NaN     5       10 
2017-10-22 11:21:00     NaN     5       10 
2017-10-22 11:22:00     12      5       10 
2017-10-22 11:23:00     12      5       10 
2017-10-22 11:24:00     12      5       10 
2017-10-22 11:25:00     12      8       10 
2017-10-22 11:26:00     12      8       10 

I have been trying to create new dataframe and merge it with the old one but not with much luck. Thanks for any suggestions.

Upvotes: 0

Views: 47

Answers (1)

Tasko Olevski
Tasko Olevski

Reputation: 476

Assuming 'd' is the dataframe you show as your starting point then:

d.date = pd.to_datetime(d.date).values.astype('<M8[m]')
d = d.set_index('date')
d.resample('1Min').ffill().fillna(method='ffill')

You have to 'chop off' the seconds and fractions of seconds by calling .values.astype('<M8[m]') because ffill rounds up even for a few seconds. Also the two ffill calls seem weird but if you try with just one you will see that things are filled in only partially until another non-NaN value shows up in any column.

And also make sure 'date' is set as the index.

Upvotes: 1

Related Questions