Reputation: 12539
I am performing a bunch of aggregate stats on a groupby data frame. For one column in particular, ios_id
, I would like a count and a distinct count. I'm not sure how o output this to two seaparate columns with different names. As of right now, the distinct count just overwrites the count.
How do I output both the distinct count and the count for the ios_id column to two separate columns?
df_new = df.groupby('video_id').agg({"ios_id": np.count_nonzero,
"ios_id": pd.Series.nunique,
"feed_position": np.average,
"time_watched": np.sum,
"video_length": np.sum}).sort('ios_id', ascending=False)
Upvotes: 2
Views: 297
Reputation: 109666
Something like this should work. Note the nested dictionary structure for iOS_id.
df_new = df.groupby('video_id').agg({"ios_id": {"count": "count",
"distinct": "unique"},
"feed_position": np.average,
"time_watched": np.sum,
"video_length": np.sum})
For more details, please refer to Naming returned columns in Pandas aggregate function:
Upvotes: 1