Down-sampling Non-Time Series data

Question

I have two dataframes where both are day-count data with respect to some start time. One dataframe is a signal data sampled at a higher rate compared to the other dataframe signal data. The overall time frame is same for both the signal dataframes. My question is what approach can I use to downsample the one with more rows to match the number of rows. the data frame looks like these: (where the first column is day count from a particular start time)

    Days  Data at 1hz
0  0.958        83.63
1  0.958        83.08
2  0.958        82.45
3  0.958        81.83
4  0.959        81.18

    Days  Data at 4hz
0  0.958        0.028
1  0.958        0.028
2  0.958        0.027
3  0.958        0.029
4  0.958        0.028

I have tried the pandas.resample() function but apparently it works for only timestamp index. And also rolling()function makes the other values over a window = 8 (diff in sampling ratio of data1 and data2) as Nan and does not remove the rows.

Is there a way, I can apply mean() or some other approach to make them both of same rows by removing some rows ?

Ben.T · Accepted Answer

Using rolling is a good idea what you are missing is a dropna after to get rid of the extra rows. Here an example, with 3 rows per Date in df1 and 8 rows per Date in df2:

df1 = pd.DataFrame({'Date':[0.958]*3 + [0.959]*3, 
                    'Data_1': [83.63,83.08,82.45,81.83,82.76,84.97]})
df2 = pd.DataFrame({'Date':[0.958]*8 + [0.959]*8, 
                    'Data_4': [0.028,0.028,0.027,0.029]*4})

Now, you want to remove 5 rows in df2 per Date to get only 3 rows as in df1, you get it with groupby, rolling and dropna. Function mean can be changed if you want something else, and reset_index is more for cosmetic.

df3 = df2.groupby('Date').rolling(window=6).mean().dropna().reset_index(drop=True)

which gives the result for df3

     Data_4   Date
0  0.028000  0.958
1  0.027833  0.958
2  0.028000  0.958
3  0.028000  0.959
4  0.027833  0.959
5  0.028000  0.959

with only 3 rows per Date from data in df2.

Note that in the rolling, window = 6, which is the difference of rows per Date between df2 and df1 plus 1, otherwise you get one extra row.

Down-sampling Non-Time Series data

Answers (2)

create data:

Downsample:

Related Questions