groupby and mean of subsets of columns in Python dataframe

Question

If I have these 2 columns:

dat=[['yes','dog', 20,4,60,400],['yes','dog', 20,4,60,300],['yes','cat', 20,10,10,float('nan')]]
df_dat= pd.DataFrame(dat,columns = ["Time","animal", "val", "val2", "val3", "val4"])

I want to get a dataframe that uses groupby the "Time" and "animal". It then takes the means of combinations of the other columns. One subset is ["val","val3"] and ["val2","val4"].

Basically, something that takes means of the result of df_dat.groupby(["Time","animal"]).mean() for value column subsets

The output I'm looking for looks like (but in dataframe format):

[Index , 'val'/'val3','val2/val4'] 
[('yes','dog'),40,177]
[('yes','cat'),15,10]

user3483203 · Accepted Answer

Setup

df = df_dat.groupby(['Time', 'animal']).mean()
subsets = [["val","val3"], ["val2","val4"]]

Using a dictionary comprehension and assign:

df.assign(**{'/'.join(cols): df[cols].mean(1) for cols in subsets})

             val  val2  val3   val4  val/val3  val2/val4
Time animal
yes  cat      20    10    10    NaN      15.0       10.0
     dog      20     4    60  350.0      40.0      177.0

If you only want the subset columns:

pd.DataFrame({'/'.join(cols): df[cols].mean(1) for cols in subsets})

             val/val3  val2/val4
Time animal
yes  cat         15.0       10.0
     dog         40.0      177.0

groupby and mean of subsets of columns in Python dataframe

Answers (2)

Related Questions