Reputation: 139
I have a DataFrame with lots of categories, but I'm only trying to use two. I managed to get the result I wanted but it wasn't accepted in my project ('there's better ways of doing it'). Working with 2 columns - Gender (M/F) and Showed (1/0) I'm trying to get out 4 variables: male1, male0, female1, female0 to create bar chart with them.
I was told to use pd.series.map function but I've looked everywhere and can't find a good example on it - also not really sure how to get 4 variables out of it.
Thanks for any help.
Upvotes: 0
Views: 935
Reputation: 59579
This seems like a case for crosstab
(it's a built-in function :D)
import pandas as pd
df = pd.DataFrame([['M', 0], ['M', 1], ['M', 1], ['F', 0], ['F', 0], ['F', 1]],
columns=['Gender', 'Showed'])
pd.crosstab(df.Gender, df.Showed)
Showed 0 1
Gender
F 2 1
M 1 2
Upvotes: 1
Reputation: 164843
pd.Series.map
is unnecessary. You can use GroupBy
here and output a dictionary:
df = pd.DataFrame([['M', 0], ['M', 1], ['M', 1], ['F', 0], ['F', 0], ['F', 1]],
columns=['Gender', 'Showed'])
d = df.groupby(['Gender', 'Showed']).size().to_dict()
# {('F', 0): 2, ('F', 1): 1, ('M', 0): 1, ('M', 1): 2}
In general, you should avoid creating a variable number of variables. A dictionary allows you to extract values efficiently, e.g. via d[('F', 0)]
for Female gender and 0 showed.
But if you really must use map
, you can use the pd.Index.map
version:
d = df.groupby(['Gender', 'Showed']).size()
res = df.drop_duplicates()
res['Counts'] = res.set_index(['Gender', 'Showed']).index.map(d.get)
print(res)
Gender Showed Counts
0 M 0 1
1 M 1 2
3 F 0 2
5 F 1 1
Upvotes: 1
Reputation: 636
You can do this in 4 simple lines.
male0 = ((df['Gender'] == 'M') & (df['Showed'] == 0)).sum()
female0 = ((df['Gender'] == 'F') & (df['Showed'] == 0)).sum()
male1 = ((df['Gender'] == 'M') & (df['Showed'] == 1)).sum()
female1 = ((df['Gender'] == 'F') & (df['Showed'] == 1)).sum()
Using apply
, since you need two series and not one, you need to use apply
.
male0 = df[['Gender', 'Showed']].apply(lambda row: row['Gender'] == 'M' and row['Showed'] == 0, axis=1).sum()
female0 = df[['Gender', 'Showed']].apply(lambda row: row['Gender'] == 'F' and row['Showed'] == 0, axis=1).sum()
male1 = df[['Gender', 'Showed']].apply(lambda row: row['Gender'] == 'M' and row['Showed'] == 1, axis=1).sum()
female1 = df[['Gender', 'Showed']].apply(lambda row: row['Gender'] == 'F' and row['Showed'] == 1, axis=1).sum()
Using groupby
counts = df.groupby(['Gender', 'Showed']).size().reset_index(name='Count')
Upvotes: 0