How to get a cross tabulation with pandas crosstab that would display the frequency of multiple values of a column variable?

Let's say i have a dataframe:

df = pd.DataFrame(np.random.randint(0,5, size=(5,6)), columns=list('ABCDEF'))

Crossing variables with pd.crosstab is simple enough:

table = pd.crosstab(index=df['A'], columns=df['B'])

Yields:

B  1  2  3  4
A            
0  1  0  0  0
1  0  0  0  1
2  0  1  1  0
3  0  1  0  0

Where I would for example want a table like this:

B  (1+2+3) 1  2  3  4
A            
0     1    1  0  0  0
1     0    0  0  0  1
2     2    0  1  1  0
3     1    0  1  0  0

Can anyone set me on the right track here?

Upvotes: 2

Views: 3045

Answers (1)

jezrael
jezrael

Reputation: 862741

Use sum with subset, but if use small random df there can be problem you get always another values so values of columns will be different. If use np.random.seed(100) get same test output as my answer.

table['(1+2+3)'] = table[[1,2,3]].sum(axis=1)

Sample:

np.random.seed(100)
df = pd.DataFrame(np.random.randint(0,5, size=(5,6)), columns=list('ABCDEF'))
table = pd.crosstab(index=df['A'], columns=df['B'])
table['(1+2+3)'] = table[[1,2,3]].sum(axis=1)
print (table)
B  0  1  2  3  4  (1+2+3)
A                        
0  1  0  0  0  1        0
1  0  0  0  1  0        1
2  0  0  1  0  0        1
3  0  1  0  0  0        1

Upvotes: 2

Related Questions