How to have a cross tabulation for categorical data in Pandas (Python)?

Question

I have the following code for example.

df = pd.DataFrame(dtype="category")
df["Gender"]=np.random.randint(2, size=100)
df["Q1"] = np.random.randint(3, size=100)
df["Q2"] = np.random.randint(3, size=100)
df["Q3"] = np.random.randint(3, size=100)
df[["Gender", "Q1", "Q2", "Q3"]] = df[["Gender", "Q1", "Q2", "Q3"]].astype('category')
pd.pivot_table(data=df,index=["Gender"])

I want to have a pivot table with percentages over gender for all the other columns. Infact, like the follwing.

How to achieve this?

The above code gives an error saying that No numeric types to aggregate

I dont have any numerical columns. I just want to find the frequency in each category under male and female and find the percentage of them over male and female respectively.

stahamtan · Accepted Answer

As suggested by your question, you can use the pd.crosstab to make the cross tabulation you need.

You just need to do a quick preprocessing with your data, which is to melt and convert Q columns to rows (see details below):

df = df.melt(id_vars='Gender', 
         value_vars=['Q1', 'Q2', 'Q3'], 
         var_name='Question', value_name='Answer' )

Then you can use pd.crosstab and calculate percentage as needed (here the percentage for each Question per Gender per Answer is shown)

pd.crosstab(df.Question, columns=[df.Gender, df.Answer]).apply(lambda row: row/row.sum(), axis=1)

Gender       0                 1            
Answer       0     1     2     0     1     2
Question                                    
Q1        0.13  0.18  0.18  0.13  0.19  0.19
Q2        0.09  0.21  0.19  0.22  0.13  0.16
Q3        0.19  0.10  0.20  0.16  0.18  0.17

Details

df.head()

  Gender Q1 Q2 Q3
0      1  0  2  0
1      1  0  0  1
2      0  2  0  2
3      0  0  2  0
4      0  1  1  1

df.melt().head()

  Gender Question Answer
0      1       Q1      0
1      1       Q1      0
2      0       Q1      2
3      0       Q1      0
4      0       Q1      1

How to have a cross tabulation for categorical data in Pandas (Python)?

Answers (1)

Related Questions