Reputation: 147
I have data with four columns, that includes: Id
, CreationDate
, Score
and ViewCount
.
The CreationDate
has a next format, for example: 2011-11-30 19:41:14.960.
I need to groupby the years of CreationDate
, count them, summing Score
and ViewCount
also, and to add to additional columns.
I want to use with pandas lib.
Thanks!
Before changing - sample example:
Id CreationDate Score ViewCount
0 1 2011-11-30 19:15:54.070 25 1526
1 2 2011-11-30 19:41:14.960 20 601
2 3 2012-11-30 19:42:45.470 36 1015
3 4 2018-11-30 19:44:55.593 8 1941
4 5 2011-11-30 19:53:23.387 11 5053
5 6 2018-11-30 20:04:43.757 25 5123
6 7 2011-11-30 20:08:23.267 53 8945
After changing - present data like this:
Id CreationDate Score ViewCount
0 1 2011 109 16125
2 3 2012 36 1015
3 4 2018 33 7064
Upvotes: 1
Views: 891
Reputation: 862841
You can convert column to years by Series.dt.year
and aggregate by GroupBy.agg
with dictionary for columns with aggregation function, last add DataFrame.reindex
if necessary same order of columns like in original DataFrame
:
#if necessary convert to datetimes
df['CreationDate'] = pd.to_datetime(df['CreationDate'])
df1 = (df.groupby(df['CreationDate'].dt.year)
.agg({'Id':'first', 'Score':'sum', 'ViewCount':'sum'})
.reset_index()
.reindex(columns=df.columns)
)
print (df1)
Id CreationDate Score ViewCount
0 1 2011 109 16125
1 3 2012 36 1015
2 4 2018 33 7064
Upvotes: 3