geekygeek
geekygeek

Reputation: 323

Average per time interval by keeping the rows of a DataFrame

This question concerns the dataframes in Pandas. Here is my problem. I have a DataFrame of this format:

Date Weight
2021-09-30 00:00:00.00 10
2021-09-30 00:30:00.00 12
2021-09-30 01:00:00.00 10
2021-09-30 01:30:00.00 13

I would to create a new column which takes the mean over each hour by conserving my dataframe. The result should be like:

Date Weight Mean
2021-09-30 00:00:00.00 10 11
2021-09-30 00:30:00.00 12 11
2021-09-30 01:00:00.00 10 11.5
2021-09-30 01:30:00.00 13 11.5

Here is an example code:

timestamp = pd.date_range(start="2021-10-01T00:00:000Z", end="2021-10-02T00:00:000Z", freq="200L")
df = timestamp.to_frame(index=True, name='timestamp')
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['weight'] = np.random.randint(1, 20, df.shape[0])
df = df.set_index(['timestamp'])

It creates a DataFrame like this: enter image description here

Now, I can do a resample like: dfresample = df.resample('8h').mean()

which gives: enter image description here

However, this new DataFrame is reduced to 4 rows with an average of 8 hours. I would like to conserve my 432001 rows of first DataFrame df, where a new column mean is added, where each row takes the mean of that corresponding '8h' average. How could I do that?

Thanks

Upvotes: 0

Views: 594

Answers (1)

Raymond Kwok
Raymond Kwok

Reputation: 2541

Better way, without map:

df['mean'] = df.groupby(df.index.floor('H'))['weight'].transform('mean')

or using map:

You can take the mean first,

mean = df.groupby(df.index.floor('H'))['weight'].mean()

and put them back in a new column

df['mean'] = df.index.floor('H').map(mean)

Upvotes: 2

Related Questions