cd123
cd123

Reputation: 521

Pandas- Groupby multiple columns and mean from a single column

I have a large dataframe of the following structure which is simplified for the purpose of this question:

  A     B     C D ... J K 
  date1 time1 1 1 ... 1 1
  date2 time2 2 2 ... 2 2

Essentially the first 3 columns all represent string data i.e times and dates. The dataframe I am reading in from a csv file and the data after the times has multiple points on a single day for each column.

What I wish to do is find a way to group all the data by dates and create mean values per day which account for multiple datapoints on a single day. This is handled nicely by using group.by(dates) however I lose all of the other data not included in the dates column so the result is like so:

Before group.by().mean():

   A     B     C D ... J K
   date1 time1 1 1 ... 1 1
   date2 time2 2 2 ... 2 2
   date2 time3 1 1 ... 1 1

After:

   A      C   D   ... J   K
   date1  1   1       1   1
   date2  1.5 1.5 ... 1.5 1.5

My ideal output would be to keep the data in the time column and date column whilst still creating mean values based on the days. This would result in the following:

Ideal output:

   A      B     C   D   ... J   K
   date1  time1 1   1       1   1
   date2  time2 1.5 1.5 ... 1.5 1.5

Upvotes: 0

Views: 817

Answers (1)

jezrael
jezrael

Reputation: 863556

There is problem you need aggregate columns with strings and times too, e.g. by first, else are omited.

So possible solution is create dict of aggregation functions and use groupby + agg + reset_index + reindex_axis:

print (df)

   A      B      C  D  E  J  K
0  a  date1  time1  1  1  1  1
1  b  date2  time2  2  2  2  2
2  c  date2  time3  1  1  1  1

cols = ['A','B','C']
d = {x:'mean' for x in df.columns.difference(cols)}
d['A'] = 'first'
d['C'] = 'first'
print (d)
{'E': 'mean', 'D': 'mean', 'J': 'mean', 'A': 'first', 'C': 'first', 'K': 'mean'}

df1 = df.groupby('B').agg(d).reset_index().reindex_axis(df.columns, axis=1)
print (df1)
   A      B      C    D    E    J    K
0  a  date1  time1  1.0  1.0  1.0  1.0
1  b  date2  time2  1.5  1.5  1.5  1.5

Upvotes: 1

Related Questions