lamo_738
lamo_738

Reputation: 440

Down-sampling Non-Time Series data

I have two dataframes where both are day-count data with respect to some start time. One dataframe is a signal data sampled at a higher rate compared to the other dataframe signal data. The overall time frame is same for both the signal dataframes. My question is what approach can I use to downsample the one with more rows to match the number of rows. the data frame looks like these: (where the first column is day count from a particular start time)

    Days  Data at 1hz
0  0.958        83.63
1  0.958        83.08
2  0.958        82.45
3  0.958        81.83
4  0.959        81.18

    Days  Data at 4hz
0  0.958        0.028
1  0.958        0.028
2  0.958        0.027
3  0.958        0.029
4  0.958        0.028

I have tried the pandas.resample() function but apparently it works for only timestamp index. And also rolling()function makes the other values over a window = 8 (diff in sampling ratio of data1 and data2) as Nan and does not remove the rows.

Is there a way, I can apply mean() or some other approach to make them both of same rows by removing some rows ?

Upvotes: 3

Views: 1885

Answers (2)

n4321d
n4321d

Reputation: 1205

an alternative option is to use groupby:

create data:

import pandas as pd
df = pd.DataFrame({'days': [0.958] * 100, 'data 4Hz': [*range(100)]})
df

output:
enter image description here

Downsample:

to go from 4 hz -> 1 hz we will downsample using a window of 4, and we use a mean here to downsample, but you can use anything that works with groupby.

window = 4
df_1hz = df.groupby((df.index/window).astype('i')).mean()
df_1hz.head()

output:
enter image description here

Upvotes: 0

Ben.T
Ben.T

Reputation: 29635

Using rolling is a good idea what you are missing is a dropna after to get rid of the extra rows. Here an example, with 3 rows per Date in df1 and 8 rows per Date in df2:

df1 = pd.DataFrame({'Date':[0.958]*3 + [0.959]*3, 
                    'Data_1': [83.63,83.08,82.45,81.83,82.76,84.97]})
df2 = pd.DataFrame({'Date':[0.958]*8 + [0.959]*8, 
                    'Data_4': [0.028,0.028,0.027,0.029]*4})

Now, you want to remove 5 rows in df2 per Date to get only 3 rows as in df1, you get it with groupby, rolling and dropna. Function mean can be changed if you want something else, and reset_index is more for cosmetic.

df3 = df2.groupby('Date').rolling(window=6).mean().dropna().reset_index(drop=True)

which gives the result for df3

     Data_4   Date
0  0.028000  0.958
1  0.027833  0.958
2  0.028000  0.958
3  0.028000  0.959
4  0.027833  0.959
5  0.028000  0.959

with only 3 rows per Date from data in df2.

Note that in the rolling, window = 6, which is the difference of rows per Date between df2 and df1 plus 1, otherwise you get one extra row.

Upvotes: 1

Related Questions