Reputation: 9804
Suppose I have a Pandas DataFrame called df
with columns a
and b
and what I want is the number of distinct values of b
per each a
. I would do:
distcounts = df.groupby('a')['b'].nunique()
which gives the desidered result, but it is as Series object rather than another DataFrame. I'd like a DataFrame instead. In regular SQL, I'd do:
SELECT a, COUNT(DISTINCT(b)) FROM df
and haven't been able to emulate this query in Pandas exactly. How to?
Upvotes: 3
Views: 1294
Reputation: 29719
Another alternative using Groupby.agg
instead:
df.groupby('a', as_index=False).agg({'b': 'nunique'})
Upvotes: 3
Reputation: 863236
I think you need reset_index
:
distcounts = df.groupby('a')['b'].nunique().reset_index()
Sample:
df = pd.DataFrame({'a':[7,8,8],
'b':[4,5,6]})
print (df)
a b
0 7 4
1 8 5
2 8 6
distcounts = df.groupby('a')['b'].nunique().reset_index()
print (distcounts)
a b
0 7 1
1 8 2
Upvotes: 9