Reputation: 5569
I have a pandas dataframe df
import pandas
df = pandas.DataFrame(
data=[["A", "Man"], ["A", "Woman"], ["A", "Man"], ["A", "Man"], ["B", "Woman"]],
columns=["category", "gender"],
)
df
category gender
0 A Man
1 A Woman
2 A Man
3 A Man
4 B Woman
and I count how many men and women are in each category
grouped = df.groupby(by=["category", "gender"])["gender"].count()
grouped
category gender
A Man 3
Woman 1
B Woman 1
Name: gender, dtype: int64
how can I get a list of categories for which both men and women are more than 1?
category_list = [A]
Upvotes: 1
Views: 43
Reputation: 150755
IIUC,
s = df.groupby('category')['gender'].value_counts().unstack(fill_value=0)
s[s.ge(1).all(1)]
gives you
gender Man Woman
category
A 3 1
Upvotes: 2
Reputation: 9018
You can just convert the result to a dataframe and then apply query filter:
pandas.DataFrame(grouped).query("gender > 1")
gender
category gender
A Man 3
Or you can directly do:
grouped[grouped > 1]
Upvotes: 1