morris
morris

Reputation: 147

Pandas, groupby and counting data in others columns

I have data with four columns, that includes: Id, CreationDate, Score and ViewCount.

The CreationDate has a next format, for example: 2011-11-30 19:41:14.960. I need to groupby the years of CreationDate, count them, summing Score and ViewCount also, and to add to additional columns.

I want to use with pandas lib.

Thanks!

Before changing - sample example:

     Id   CreationDate              Score   ViewCount
0    1    2011-11-30 19:15:54.070   25      1526
1    2    2011-11-30 19:41:14.960   20      601
2    3    2012-11-30 19:42:45.470   36      1015
3    4    2018-11-30 19:44:55.593   8       1941
4    5    2011-11-30 19:53:23.387   11      5053
5    6    2018-11-30 20:04:43.757   25      5123
6    7    2011-11-30 20:08:23.267   53      8945

After changing - present data like this:

     Id   CreationDate              Score   ViewCount
0    1    2011                      109     16125
2    3    2012                      36      1015
3    4    2018                      33      7064                            

Upvotes: 1

Views: 891

Answers (1)

jezrael
jezrael

Reputation: 862841

You can convert column to years by Series.dt.year and aggregate by GroupBy.agg with dictionary for columns with aggregation function, last add DataFrame.reindex if necessary same order of columns like in original DataFrame:

#if necessary convert to datetimes
df['CreationDate'] = pd.to_datetime(df['CreationDate'])

df1 = (df.groupby(df['CreationDate'].dt.year)
         .agg({'Id':'first', 'Score':'sum', 'ViewCount':'sum'})
         .reset_index()
         .reindex(columns=df.columns)
       )

print (df1)
   Id  CreationDate  Score  ViewCount
0   1          2011    109      16125
1   3          2012     36       1015
2   4          2018     33       7064

Upvotes: 3

Related Questions