How to get a cross tabulation with pandas crosstab that would display the frequency of multiple values of a column variable?

Question

Let's say i have a dataframe:

df = pd.DataFrame(np.random.randint(0,5, size=(5,6)), columns=list('ABCDEF'))

Crossing variables with pd.crosstab is simple enough:

table = pd.crosstab(index=df['A'], columns=df['B'])

Yields:

B  1  2  3  4
A            
0  1  0  0  0
1  0  0  0  1
2  0  1  1  0
3  0  1  0  0

Where I would for example want a table like this:

B  (1+2+3) 1  2  3  4
A            
0     1    1  0  0  0
1     0    0  0  0  1
2     2    0  1  1  0
3     1    0  1  0  0

Can anyone set me on the right track here?

jezrael · Accepted Answer

Use sum with subset, but if use small random df there can be problem you get always another values so values of columns will be different. If use np.random.seed(100) get same test output as my answer.

table['(1+2+3)'] = table[[1,2,3]].sum(axis=1)

Sample:

np.random.seed(100)
df = pd.DataFrame(np.random.randint(0,5, size=(5,6)), columns=list('ABCDEF'))
table = pd.crosstab(index=df['A'], columns=df['B'])
table['(1+2+3)'] = table[[1,2,3]].sum(axis=1)
print (table)
B  0  1  2  3  4  (1+2+3)
A                        
0  1  0  0  0  1        0
1  0  0  0  1  0        1
2  0  0  1  0  0        1
3  0  1  0  0  0        1

How to get a cross tabulation with pandas crosstab that would display the frequency of multiple values of a column variable?

Answers (1)

Related Questions