Reputation: 15207
Suppose I have a DataFrame such as,
In [7]: source = pd.DataFrame([['amazon.com', 'correct', 'correct'], ['amazon.com', 'incorrect', 'correct'], ['walmart.com', 'incorrect', 'correct'], ['walmart.com', 'incorrect', 'incorrect']], columns=['domain', 'price', 'product'])
In [8]: source
Out[8]:
domain price product
0 amazon.com correct correct
1 amazon.com incorrect correct
2 walmart.com incorrect correct
3 walmart.com incorrect incorrect
I would like to count, for each domain
, the number of times price == 'correct'
and price == 'incorrect'
, and the same for product
. In other words, I'd like to see output like so,
domain key value count
0 amazon.com price correct 1
1 amazon.com price incorrect 1
2 amazon.com product correct 2
3 walmart.com price incorrect 2
4 walmart.com product correct 1
5 walmart.com product incorrect 1
How do I do this?
Upvotes: 4
Views: 3275
Reputation: 15207
In [17]: %paste
(
pd.melt(source, id_vars=['domain'], value_vars=['price', 'product'])
.groupby(['domain', 'variable', 'value'])
.size()
.reset_index()
.rename(columns={'variable': 'key', 0: 'count'})
)
## -- End pasted text --
Out[17]:
domain key value count
0 amazon.com price correct 1
1 amazon.com price incorrect 1
2 amazon.com product correct 2
3 walmart.com price incorrect 2
4 walmart.com product correct 1
5 walmart.com product incorrect 1
Upvotes: 0
Reputation: 128958
A nested apply will do it
In [24]: source.groupby('domain').apply(lambda x:
x[['price','product']].apply(lambda y: y.value_counts())).fillna(0)
Out[24]:
price product
domain
amazon.com correct 1 2
incorrect 1 0
walmart.com correct 0 1
incorrect 2 1
Upvotes: 7