Reputation: 524
I have the following code for example.
df = pd.DataFrame(dtype="category")
df["Gender"]=np.random.randint(2, size=100)
df["Q1"] = np.random.randint(3, size=100)
df["Q2"] = np.random.randint(3, size=100)
df["Q3"] = np.random.randint(3, size=100)
df[["Gender", "Q1", "Q2", "Q3"]] = df[["Gender", "Q1", "Q2", "Q3"]].astype('category')
pd.pivot_table(data=df,index=["Gender"])
I want to have a pivot table with percentages over gender for all the other columns. Infact, like the follwing.
How to achieve this?
The above code gives an error saying that
No numeric types to aggregate
I dont have any numerical columns. I just want to find the frequency in each category under male and female and find the percentage of them over male and female respectively.
Upvotes: 1
Views: 3237
Reputation: 858
As suggested by your question, you can use the pd.crosstab
to make the cross tabulation you need.
You just need to do a quick preprocessing with your data, which is to melt
and convert Q
columns to rows (see details below):
df = df.melt(id_vars='Gender',
value_vars=['Q1', 'Q2', 'Q3'],
var_name='Question', value_name='Answer' )
Then you can use pd.crosstab
and calculate percentage as needed (here the percentage for each Question
per Gender
per Answer
is shown)
pd.crosstab(df.Question, columns=[df.Gender, df.Answer]).apply(lambda row: row/row.sum(), axis=1)
Gender 0 1
Answer 0 1 2 0 1 2
Question
Q1 0.13 0.18 0.18 0.13 0.19 0.19
Q2 0.09 0.21 0.19 0.22 0.13 0.16
Q3 0.19 0.10 0.20 0.16 0.18 0.17
Details
df.head()
Gender Q1 Q2 Q3
0 1 0 2 0
1 1 0 0 1
2 0 2 0 2
3 0 0 2 0
4 0 1 1 1
df.melt().head()
Gender Question Answer
0 1 Q1 0
1 1 Q1 0
2 0 Q1 2
3 0 Q1 0
4 0 Q1 1
Upvotes: 2