Pandas- Groupby multiple columns and mean from a single column

Question

I have a large dataframe of the following structure which is simplified for the purpose of this question:

  A     B     C D ... J K 
  date1 time1 1 1 ... 1 1
  date2 time2 2 2 ... 2 2

Essentially the first 3 columns all represent string data i.e times and dates. The dataframe I am reading in from a csv file and the data after the times has multiple points on a single day for each column.

What I wish to do is find a way to group all the data by dates and create mean values per day which account for multiple datapoints on a single day. This is handled nicely by using group.by(dates) however I lose all of the other data not included in the dates column so the result is like so:

Before group.by().mean():

   A     B     C D ... J K
   date1 time1 1 1 ... 1 1
   date2 time2 2 2 ... 2 2
   date2 time3 1 1 ... 1 1

After:

   A      C   D   ... J   K
   date1  1   1       1   1
   date2  1.5 1.5 ... 1.5 1.5

My ideal output would be to keep the data in the time column and date column whilst still creating mean values based on the days. This would result in the following:

Ideal output:

   A      B     C   D   ... J   K
   date1  time1 1   1       1   1
   date2  time2 1.5 1.5 ... 1.5 1.5

jezrael · Accepted Answer

There is problem you need aggregate columns with strings and times too, e.g. by first, else are omited.

So possible solution is create dict of aggregation functions and use groupby + agg + reset_index + reindex_axis:

print (df)

   A      B      C  D  E  J  K
0  a  date1  time1  1  1  1  1
1  b  date2  time2  2  2  2  2
2  c  date2  time3  1  1  1  1

cols = ['A','B','C']
d = {x:'mean' for x in df.columns.difference(cols)}
d['A'] = 'first'
d['C'] = 'first'
print (d)
{'E': 'mean', 'D': 'mean', 'J': 'mean', 'A': 'first', 'C': 'first', 'K': 'mean'}

df1 = df.groupby('B').agg(d).reset_index().reindex_axis(df.columns, axis=1)
print (df1)
   A      B      C    D    E    J    K
0  a  date1  time1  1.0  1.0  1.0  1.0
1  b  date2  time2  1.5  1.5  1.5  1.5

Pandas- Groupby multiple columns and mean from a single column

Answers (1)

Related Questions