How to downsampling time series data in pandas?

Question

I have a time series in pandas that looks like this (order by id):

id    time    value
 1       0        2
 1       1        4
 1       2        5
 1       3       10
 1       4       15
 1       5       16
 1       6       18
 1       7       20
 2      15        3
 2      16        5
 2      17        8
 2      18       10
 4       6        5
 4       7        6

I want downsampling time from 1 minute to 3 minutes for each group id. And value is a maximum of group (id and 3 minutes).

The output should be like:

id    time    value
 1       0        5
 1       1       16
 1       2       20
 2       0        8
 2       1       10
 4       0        6

I tried loop it take long time process.

Any idea how to solve this for large dataframe?

Thanks!

Scott Boston · Accepted Answer

Use np.r_ and .iloc with groupby:

df.groupby('id')['value'].apply(lambda x: x.iloc[np.r_[2:len(x):3,-1]])

Output:

id    
1   2      5
    5     16
    7     20
2   10     8
    11    10
4   13     6
Name: value, dtype: int64

Going a little further with column naming etc..

df_out = df.groupby('id')['value']\
           .apply(lambda x: x.iloc[np.r_[2:len(x):3,-1]]).reset_index()
df_out.assign(time=df_out.groupby('id').cumcount()).drop('level_1', axis=1)

Output:

   id  value  time
0   1      5     0
1   1     16     1
2   1     20     2
3   2      8     0
4   2     10     1
5   4      6     0

How to downsampling time series data in pandas?

Answers (2)

Related Questions