Reputation: 323
This question concerns the dataframes in Pandas. Here is my problem. I have a DataFrame of this format:
Date | Weight |
---|---|
2021-09-30 00:00:00.00 | 10 |
2021-09-30 00:30:00.00 | 12 |
2021-09-30 01:00:00.00 | 10 |
2021-09-30 01:30:00.00 | 13 |
I would to create a new column which takes the mean over each hour by conserving my dataframe. The result should be like:
Date | Weight | Mean |
---|---|---|
2021-09-30 00:00:00.00 | 10 | 11 |
2021-09-30 00:30:00.00 | 12 | 11 |
2021-09-30 01:00:00.00 | 10 | 11.5 |
2021-09-30 01:30:00.00 | 13 | 11.5 |
Here is an example code:
timestamp = pd.date_range(start="2021-10-01T00:00:000Z", end="2021-10-02T00:00:000Z", freq="200L")
df = timestamp.to_frame(index=True, name='timestamp')
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['weight'] = np.random.randint(1, 20, df.shape[0])
df = df.set_index(['timestamp'])
It creates a DataFrame like this:
Now, I can do a resample like:
dfresample = df.resample('8h').mean()
However, this new DataFrame is reduced to 4 rows with an average of 8 hours. I would like to conserve my 432001 rows of first DataFrame df, where a new column mean is added, where each row takes the mean of that corresponding '8h' average. How could I do that?
Thanks
Upvotes: 0
Views: 594
Reputation: 2541
Better way, without map
:
df['mean'] = df.groupby(df.index.floor('H'))['weight'].transform('mean')
or using map
:
You can take the mean first,
mean = df.groupby(df.index.floor('H'))['weight'].mean()
and put them back in a new column
df['mean'] = df.index.floor('H').map(mean)
Upvotes: 2