Reputation: 1227
I have this pandas DataFrame df
:
Station DateTime Record
A 2017-01-01 00:00:00 20
A 2017-01-01 01:00:00 22
A 2017-01-01 02:00:00 20
A 2017-01-01 03:00:00 18
B 2017-01-01 00:00:00 22
B 2017-01-01 01:00:00 24
I want to estimate the average Record
per DateTime
(basically per hour) across stations A
and B
. If either A
or B
have no record for some DateTime
, then the Record
value should be considered as 0 for this station.
It can be assumed that DateTime
is available for all hours for at least one Station
.
This is the expected result:
DateTime Avg_Record
2017-01-01 00:00:00 21
2017-01-01 01:00:00 23
2017-01-01 02:00:00 10
2017-01-01 03:00:00 9
Upvotes: 0
Views: 64
Reputation: 18916
Here is a solution:
g = df.groupby('DateTime')['Record']
df_out = g.mean()
m = g.count() == 1
df_out.loc[m] = df_out.loc[m] / 2
df_out = df_out.reset_index()
Or an uglier one-liner:
df = df.groupby('DateTime')['Record'].apply(
lambda x: x.mean() if x.size == 2 else x.values[0]/2
).reset_index()
Proof:
import pandas as pd
data = '''\
Station DateTime Record
A 2017-01-01T00:00:00 20
A 2017-01-01T01:00:00 22
A 2017-01-01T02:00:00 20
A 2017-01-01T03:00:00 18
B 2017-01-01T01:00:00 22
B 2017-01-01T02:00:00 24'''
fileobj = pd.compat.StringIO(data)
df = pd.read_csv(fileobj, sep='\s+', parse_dates=['DateTime'])
# Create a grouper and get the mean
g = df.groupby('DateTime')['Record']
df_out = g.mean()
# Divide by 2 where only 1 input exist
m = g.count() == 1
df_out.loc[m] = df_out.loc[m] / 2
# Reset index to get a dataframe format again
df_out = df_out.reset_index()
print(df_out)
Returns:
DateTime Record
0 2017-01-01 00:00:00 10.0
1 2017-01-01 01:00:00 22.0
2 2017-01-01 02:00:00 22.0
3 2017-01-01 03:00:00 9.0
Upvotes: 2