Reputation: 362
I have a dataframe with a column "time" of float numbers, representing days from 0 to 8, and one more column with other data. The time step is not continuous.
time_clean = np.arange(0, 8, 0.1)
noise = [random.random()/10 for n in range(len(time_clean))]
time = time_clean + noise
data = [random.random()*100 for n in range(len(time_clean))]
df = pd.DataFrame({"time": time, "data":data})
df.head()
data time
0 89.965240 0.041341
1 95.964621 0.109215
2 70.552763 0.232596
3 74.457244 0.330750
4 13.228426 0.471623
I want to resample and interpolate the data to every 15 minutes, (15/(60*24)
days).
I think the most efficient way to do this would be using the resample
method of pandas dataframes, but in order to do that I need to convert the time column into a timestamp
, and make it the index.
What is the most efficient way of doing this? Is it possible to transform an int
to datetime
?
Upvotes: 3
Views: 133
Reputation: 862711
I think you need first convert column time
to_timedelta
and then sort_values
with resample
:
Also I think the best is add one new row with 0
for always starts resample from 0
(if 0
is not in time
column it starts from minimal time
value)
df.loc[-1] = 0
df.time = pd.to_timedelta(df.time, unit='d')
df = df.sort_values('time').set_index('time').resample('15T').ffill()
print (df.head(20))
data
time
00:00:00 0.000000
00:15:00 0.000000
00:30:00 0.000000
00:45:00 0.000000
01:00:00 0.000000
01:15:00 0.000000
01:30:00 50.869889
01:45:00 50.869889
02:00:00 50.869889
02:15:00 50.869889
02:30:00 50.869889
02:45:00 50.869889
03:00:00 50.869889
03:15:00 8.846017
03:30:00 8.846017
03:45:00 8.846017
04:00:00 8.846017
04:15:00 8.846017
04:30:00 8.846017
04:45:00 8.846017
Upvotes: 3