Reputation: 2555
Given the following 1 hour dataframe:
column1
datetime
2016-08-09 19:00:00 1
2016-08-09 20:00:00 2
2016-08-10 06:00:00 3
2016-08-10 07:00:00 4
When I try to up-sample the data to a 10 min timeframe using this method:
data10min = data1hour.column1.resample("10Min").pad()
I get the following result.
column1
datetime
2016-08-09 19:00:00 1
2016-08-09 19:10:00 1
2016-08-09 19:20:00 1
2016-08-09 19:30:00 1
2016-08-09 19:40:00 1
2016-08-09 19:50:00 1
2016-08-09 20:00:00 2
2016-08-09 20:10:00 2
2016-08-09 20:20:00 2
2016-08-09 20:30:00 2
2016-08-09 20:40:00 2
2016-08-09 20:50:00 2
2016-08-09 21:00:00 2
....
2016-08-10 04:40:00 2
2016-08-10 04:50:00 2
2016-08-10 05:00:00 2
2016-08-10 05:10:00 2
2016-08-10 05:20:00 2
2016-08-10 05:30:00 2
2016-08-10 05:40:00 2
2016-08-10 05:50:00 2
2016-08-10 06:00:00 3
2016-08-10 06:10:00 3
2016-08-10 06:20:00 3
2016-08-10 06:30:00 3
2016-08-10 06:40:00 3
2016-08-10 06:50:00 3
2016-08-10 07:00:00 4
The problem is that it fills the datetime gap between 2016-08-09 20:00:00 and 2016-08-10 06:00:00.
I am looking for the following result but cannot find an efficient way of achieving this. There has to be a simple way of upsampling without filling the gaps in the datetime.
column1
datetime
2016-08-09 19:00:00 1
2016-08-09 19:10:00 1
2016-08-09 19:20:00 1
2016-08-09 19:30:00 1
2016-08-09 19:40:00 1
2016-08-09 19:50:00 1
2016-08-09 20:00:00 2
2016-08-09 20:10:00 2
2016-08-09 20:20:00 2
2016-08-09 20:30:00 2
2016-08-09 20:40:00 2
2016-08-09 20:50:00 2
2016-08-10 06:00:00 3
2016-08-10 06:10:00 3
2016-08-10 06:20:00 3
2016-08-10 06:30:00 3
2016-08-10 06:40:00 3
2016-08-10 06:50:00 3
2016-08-10 07:00:00 4
One more thing the upsampling should work on any timeframe which has gaps. For example from 1D with gaps to 1H with gaps or from 5min with gaps to 1min with gaps, etc.
Upvotes: 1
Views: 176
Reputation: 64443
You need to have a good definition of what a gap is. Assuming in your example that the interval is a constant 1 hour, anything longer will be a gap.
Given the above assumption, first reindexing to an hourly interval, and then resampling to 10Min will do the job.
idx = pd.DatetimeIndex(start=df.index[0], end=df.index[-1], freq='1H')
df.reindex(idx).resample('10min').pad().dropna()
column1
2016-08-09 19:00:00 1.0
2016-08-09 19:10:00 1.0
2016-08-09 19:20:00 1.0
2016-08-09 19:30:00 1.0
2016-08-09 19:40:00 1.0
2016-08-09 19:50:00 1.0
2016-08-09 20:00:00 2.0
2016-08-09 20:10:00 2.0
2016-08-09 20:20:00 2.0
2016-08-09 20:30:00 2.0
2016-08-09 20:40:00 2.0
2016-08-09 20:50:00 2.0
2016-08-10 06:00:00 3.0
2016-08-10 06:10:00 3.0
2016-08-10 06:20:00 3.0
2016-08-10 06:30:00 3.0
2016-08-10 06:40:00 3.0
2016-08-10 06:50:00 3.0
2016-08-10 07:00:00 4.0
In the above example i assume your original Dataframe is sorted, so taking the first and last element will cover the entire range. You could also take the min, max or some custom start and end date.
Somehow the reindexing changes the datatype to float, which is a bit strange.
Upvotes: 2