Reputation: 79
I'm doing some resampling on data and I was wondering why resampling 1min data to 5min data creates MORE time intervals than my original dataset? Also, why does t resample until 2018-12-11 (11 days longer!) than the original datset?
1-min data:
result of resampling to 5-min intervalls:
This is how I do the resampling:
df1.loc[:,'qKfz_gesamt'].resample('5min').mean()
Upvotes: 1
Views: 78
Reputation: 863146
I was wondering why resampling 1min data to 5min data creates MORE time intervals than my original dataset?
Problem is if no consecutive values in original pandas create consecutive 5minutes intervals and for not exist values are created NaNs
:
df1 = pd.DataFrame({'qKfz_gesamt': range(4)},
index=pd.to_datetime(['2018-11-25 00:00:00','2018-11-25 00:01:00',
'2018-11-25 00:02:00','2018-11-25 00:15:00']))
print (df1)
qKfz_gesamt
2018-11-25 00:00:00 0
2018-11-25 00:01:00 1
2018-11-25 00:02:00 2
2018-11-25 00:15:00 3
print (df1['qKfz_gesamt'].resample('5min').mean())
2018-11-25 00:00:00 1.0
2018-11-25 00:05:00 NaN
2018-11-25 00:10:00 NaN
2018-11-25 00:15:00 3.0
Freq: 5T, Name: qKfz_gesamt, dtype: float64
print (df1['qKfz_gesamt'].resample('5min').mean().dropna())
2018-11-25 00:00:00 1.0
2018-11-25 00:15:00 3.0
Name: qKfz_gesamt, dtype: float64
why does t resample until 2018-12-11 (11 days longer!) than the original datset?
You need filter by maximal value of index:
rng = pd.date_range('2018-11-25', periods=10)
df1 = pd.DataFrame({'a': range(10)}, index=rng)
print (df1)
a
2018-11-25 0
2018-11-26 1
2018-11-27 2
2018-11-28 3
2018-11-29 4
2018-11-30 5
2018-12-01 6
2018-12-02 7
2018-12-03 8
2018-12-04 9
df1 = df1.loc[:'2018-11-30']
print (df1)
a
2018-11-25 0
2018-11-26 1
2018-11-27 2
2018-11-28 3
2018-11-29 4
2018-11-30 5
Or:
df1 = df1.loc[df1.index <= '2018-11-30']
print (df1)
a
2018-11-25 0
2018-11-26 1
2018-11-27 2
2018-11-28 3
2018-11-29 4
2018-11-30 5
Upvotes: 1