N.Foe
N.Foe

Reputation: 79

Pandas downsampling more time intervalls?

I'm doing some resampling on data and I was wondering why resampling 1min data to 5min data creates MORE time intervals than my original dataset? Also, why does t resample until 2018-12-11 (11 days longer!) than the original datset?

1-min data:

original 1-min data

result of resampling to 5-min intervalls:

enter image description here

This is how I do the resampling:

df1.loc[:,'qKfz_gesamt'].resample('5min').mean()

Upvotes: 1

Views: 78

Answers (1)

jezrael
jezrael

Reputation: 863146

I was wondering why resampling 1min data to 5min data creates MORE time intervals than my original dataset?

Problem is if no consecutive values in original pandas create consecutive 5minutes intervals and for not exist values are created NaNs:

df1 = pd.DataFrame({'qKfz_gesamt': range(4)}, 
                   index=pd.to_datetime(['2018-11-25 00:00:00','2018-11-25 00:01:00',
                                         '2018-11-25 00:02:00','2018-11-25 00:15:00']))  
print (df1)
                     qKfz_gesamt
2018-11-25 00:00:00            0
2018-11-25 00:01:00            1
2018-11-25 00:02:00            2
2018-11-25 00:15:00            3

print (df1['qKfz_gesamt'].resample('5min').mean())
2018-11-25 00:00:00    1.0
2018-11-25 00:05:00    NaN
2018-11-25 00:10:00    NaN
2018-11-25 00:15:00    3.0
Freq: 5T, Name: qKfz_gesamt, dtype: float64

print (df1['qKfz_gesamt'].resample('5min').mean().dropna())
2018-11-25 00:00:00    1.0
2018-11-25 00:15:00    3.0
Name: qKfz_gesamt, dtype: float64

why does t resample until 2018-12-11 (11 days longer!) than the original datset?

You need filter by maximal value of index:

rng = pd.date_range('2018-11-25', periods=10)
df1 = pd.DataFrame({'a': range(10)}, index=rng)  
print (df1)
            a
2018-11-25  0
2018-11-26  1
2018-11-27  2
2018-11-28  3
2018-11-29  4
2018-11-30  5
2018-12-01  6
2018-12-02  7
2018-12-03  8
2018-12-04  9

df1 = df1.loc[:'2018-11-30']
print (df1)
            a
2018-11-25  0
2018-11-26  1
2018-11-27  2
2018-11-28  3
2018-11-29  4
2018-11-30  5

Or:

df1 = df1.loc[df1.index <= '2018-11-30']
print (df1)
            a
2018-11-25  0
2018-11-26  1
2018-11-27  2
2018-11-28  3
2018-11-29  4
2018-11-30  5

Upvotes: 1

Related Questions