Reputation: 18585
What I really want to do can be expressed in sql
like this:
SELECT v1, v2, COUNT(*) AS v_count FROM my_table GROUP BY 1,2
that means, I want to create a new data frame which is composed of 3 columns: (v1, v2, v_count)
.
Here is what I tried with pandas
:
grp = df.groupby(['v1', 'v2']) # GROUP BY v1, v2
cnt = grp.count() # get v_count for each group
but how to put them together into a new data frame?
Upvotes: 0
Views: 473
Reputation: 20517
You can select one of the aggregated columns to be v_count
and then reset the index since v1
and v2
are in the index, e.g.:
df.groupby(['v1', 'v2'])['v1'].agg({'v_count': np.size}).reset_index()
Alternatively, you can use the as_index
keyword argument instead of using reset_index
, e.g.:
df.groupby(['v1', 'v2'], as_index=False)['v1'].agg({'v_count': np.size})
Upvotes: 1