Reputation: 87
If I have these 2 columns:
dat=[['yes','dog', 20,4,60,400],['yes','dog', 20,4,60,300],['yes','cat', 20,10,10,float('nan')]]
df_dat= pd.DataFrame(dat,columns = ["Time","animal", "val", "val2", "val3", "val4"])
I want to get a dataframe that uses groupby the "Time" and "animal". It then takes the means of combinations of the other columns. One subset is ["val","val3"] and ["val2","val4"].
Basically, something that takes means of the result of df_dat.groupby(["Time","animal"]).mean() for value column subsets
The output I'm looking for looks like (but in dataframe format):
[Index , 'val'/'val3','val2/val4']
[('yes','dog'),40,177]
[('yes','cat'),15,10]
Upvotes: 2
Views: 640
Reputation: 51155
Setup
df = df_dat.groupby(['Time', 'animal']).mean()
subsets = [["val","val3"], ["val2","val4"]]
Using a dictionary comprehension and assign
:
df.assign(**{'/'.join(cols): df[cols].mean(1) for cols in subsets})
val val2 val3 val4 val/val3 val2/val4
Time animal
yes cat 20 10 10 NaN 15.0 10.0
dog 20 4 60 350.0 40.0 177.0
If you only want the subset columns:
pd.DataFrame({'/'.join(cols): df[cols].mean(1) for cols in subsets})
val/val3 val2/val4
Time animal
yes cat 15.0 10.0
dog 40.0 177.0
Upvotes: 1
Reputation: 59274
I believe you need
ndf = df_dat.groupby(['Time', 'animal']).mean()
ndf['v1v3'], ndf['v2v4'] = ndf[['val', 'val3']].mean(1), ndf[['val2', 'val4']].mean(1)
Outputs
val val2 val3 val4 v1v3 v2v4
Time animal
yes cat 20 10 10 NaN 15.0 10.0
dog 20 4 60 350.0 40.0 177.0
Can, of course, just select the mean columns
ndf[['v1v3', 'v2v4']]
v1v3 v2v4
Time animal
yes cat 15.0 10.0
dog 40.0 177.0
Upvotes: 1