Reputation: 596
I have a df.head()
of my data frame looks like this. I'm measuring my data somewhere between 7 and 9 Hz frequency and have about 100 unique values of Psnr
. If I want to create a new data frame with one point representing the averaged value of the last five minutes of each unique Psnr
, what is the best way to do so?
Datum_Zeit Psnr
2018-10-04 14:43:11.410 2
2018-10-04 14:43:11.600 2
2018-10-04 14:43:11.790 2
2018-10-04 14:43:11.970 2
2018-10-04 14:43:12.160 2
A possible way of doing this is creating sub-data frames, filtering and merging them again, like below. But, I am sure that there must be an easier solution
dfs = {k: df[df['Psnr'] == k] for k in df['Psnr'].unique()}
Upvotes: 2
Views: 40
Reputation: 164693
Since your dataframe is already sorted by time you can calculate a min_time
series aligned with your dataframe index via transform
+ last
, then filter and finally groupby
:
min_times = df.groupby('Psnr')['Datum_Zeit'].transform('last') - pd.Timedelta('5 min')
res = df.loc[df['Datum_Zeit'] > min_times]\
.groupby('Psnr').mean().reset_index()
print(res)
Psnr Speed Torque
0 2 5.5 1.25
1 3 7.0 2.00
Upvotes: 0
Reputation: 862791
I believe you need first filter last 5 minutes data for each group and then aggregate mean
by column Psnr
:
print (df)
Datum_Zeit Psnr Speed Torque
0 2018-10-04 14:38:11.410 2 8 6
1 2018-10-04 14:39:11.600 2 7 0
2 2018-10-04 14:40:11.790 2 1 1
3 2018-10-04 14:41:11.970 2 9 3
4 2018-10-04 14:43:12.160 2 5 1
5 2018-10-04 14:39:11.600 3 7 0
6 2018-10-04 14:40:11.790 3 1 1
7 2018-10-04 14:44:11.970 3 9 3
8 2018-10-04 14:45:12.160 3 5 1
df['Datum_Zeit'] = pd.to_datetime(df['Datum_Zeit'])
s = df.groupby('Psnr')['Datum_Zeit'].transform('max')
df = df[df['Datum_Zeit'].between(s - pd.Timedelta(300, 's'), s)]
print (df)
Datum_Zeit Psnr Speed Torque
1 2018-10-04 14:39:11.600 2 7 0
2 2018-10-04 14:40:11.790 2 1 1
3 2018-10-04 14:41:11.970 2 9 3
4 2018-10-04 14:43:12.160 2 5 1
7 2018-10-04 14:44:11.970 3 9 3
8 2018-10-04 14:45:12.160 3 5 1
If need mean of all columns without datetime
:
df = df.groupby('Psnr').mean()
print (df)
Speed Torque
Psnr
2 5.5 1.25
3 7.0 2.00
If need column of datetime column only:
df['dates']= df['Datum_Zeit'].astype(np.int64)
df1 = pd.to_datetime(df.groupby('Psnr')['dates'].mean()).reset_index()
print (df1)
Psnr dates
0 2 2018-10-04 14:40:56.880000000
1 3 2018-10-04 14:44:42.064999936
Upvotes: 2