Reputation: 613
From this question I know how to interpolate time series with given timestamps. I am wondering how to interpolate timestamps with given values such as the following example to get the estimated NaT
value.
interval datetime
0.782296 2012-11-19 12:40:10
0.795469 NaT
0.821426 2012-11-19 12:35:10
0.834957 NaT
0.864383 2012-11-19 12:30:10
0.906240 2012-11-19 12:25:10
P.S. I have tried to use df['datetime'].interpolate()
directly but failed.
Upvotes: 6
Views: 6134
Reputation: 851
Short variant of @montysporty answer:
>>> import time
>>> df.set_index('interval', inplace=True)
>>> df['datetime'].map(lambda x: time.mktime(pd.datetime.timetuple(x)) if not pd.isna(x) else np.nan)\
.interpolate('values')\
.map(pd.datetime.utcfromtimestamp)
interval
0.782296 2012-11-19 08:40:10.000000
0.795469 2012-11-19 08:38:29.005878
0.821426 2012-11-19 08:35:10.000000
0.834957 2012-11-19 08:33:35.503178
0.864383 2012-11-19 08:30:10.000000
0.906240 2012-11-19 08:25:10.000000
Name: datetime, dtype: datetime64[ns]
Upvotes: 1
Reputation: 94
This is very easy with RedBlackPy. It is available for macosx and linux for python 3.
import redblackpy as rb
from datetime import datetime
# do not keep Nat values, it is redundantly
# you can interpolate data with no add keys to container
index = [0.782296, 0.821426, 0.864383, 0.906240]
values = [datetime(2012, 11, 19, 12, 40, 10),
datetime(2012, 11, 19, 12, 35, 10),
datetime(2012, 11, 19, 12, 30, 10),
datetime(2012, 11, 19, 12, 25, 10) ]
# init Series with specific interpolation type (floor, ceil, nn, linear)
data = rb.Series(index=index, values=values, dtype='object',
interpolation='linear')
Now you can access by any key using interpolation!
# your index, where you wanted to interpolate
int_index = [0.795469, 0.834957]
# access to key that not in series
print( data[int_index[0]] ) # this prints 2012-11-19 12:38:29.005878
# you change interpolation type
data.set_interpolation('floor')
print( data[int_index[0]] ) # this prints 2012-11-19 12:40:10
If you want to add interpolated values to Series just use insert or setitem as follows:
# this add interpolation values to data
for el in int_index:
data[el] = data[el]
print(data)
As the latest interpolation was 'floor' the result of print(data):
Series object Untitled
0.782296: 2012-11-19 12:40:10
0.795469: 2012-11-19 12:40:10
0.821426: 2012-11-19 12:35:10
0.834957: 2012-11-19 12:35:10
0.864383: 2012-11-19 12:30:10
0.90624: 2012-11-19 12:25:10
Upvotes: 1
Reputation: 2889
This seems to work. It's probably possible to clean up the code a bit. But you get the gist of it
from datetime import datetime
import pandas as pd
import time
#Create data
df = pd.DataFrame({ 'interval' : [0.782296, 0.795469, 0.821426, 0.834957,
0.864383, 0.906240],
'datetime' : [datetime(2012, 11, 19, 12, 40, 10), pd.NaT,
datetime(2012, 11, 19, 12, 35, 10), pd.NaT,
datetime(2012, 11, 19, 12, 30, 10),
datetime(2012, 11, 19, 12, 25, 10)
]})
#Cast date to seconds (also recast the NaT to Nan)
df['seconds'] = [time.mktime(t.timetuple()) if t is not pd.NaT else float('nan') for t in df['datetime'] ]
#Set the interval as the index, as interpolation uses the index
df.set_index('interval', inplace=True)
#Use the 'values'-argument to actually use the values of the index and not the spacing
df['intepolated'] = df['seconds'].interpolate('values')
#Cast the interpolated seconds back to datetime
df['datetime2'] = [datetime.utcfromtimestamp(t) for t in df['intepolated']]
#Clean up
df.reset_index(inplace=True)
df = df[['interval', 'datetime2']]
>>>>df
Out[25]:
interval datetime2
0 0.782296 2012-11-19 11:40:10.000000
1 0.795469 2012-11-19 11:38:29.005878
2 0.821426 2012-11-19 11:35:10.000000
3 0.834957 2012-11-19 11:33:35.503178
4 0.864383 2012-11-19 11:30:10.000000
5 0.906240 2012-11-19 11:25:10.000000
Hope this is what you wanted.
Upvotes: 1