Reputation: 85
I'm doing cross-tabulation between two columns in the dataframe. Here's a sample from the columns:
column_1 column_2
A -8
B 95
A -93
D 11
C -62
D -14
A -55
C 66
B 76
D -49
I'm looking for a code that returns sub totals for A, B, C and D. For instance, for A the subtotal will be -156 (-8-93-55 = -156).
I tried to do that with pandas.crosstab()
function:
pandas.crosstab(df[column_1], df[column_2], margins=True, margins_name=column_1).Total
Here's a sample of the output:
-271 -263 -241 -223 -221 -212 -207 -201 ... sum_column
A 1 0 1 0 0 1 0 0 ... ##
B 0 0 0 1 0 0 0 0 ... ##
C 0 0 0 0 1 0 0 1 ... ##
D 0 1 0 0 0 0 1 0 ... ##
The sum column consists of the sums of the boolean values in each row, instead of the sub totals for each of the four letters. I saw once that a boolean table can be used for calculations but I quite sure that by changing the pandas.crosstab()
command the desired output can be achieved.
I'd be happy to get some ideas and thoughts from you.
Thanks.
Upvotes: 0
Views: 214
Reputation: 72
If you'd simply like the totals by the individual categories in column_1
(A, B, C, D), maybe a groupby and summation could be helpful! You would call the groupby
on the column with your categories, and then call sum on the result, like this:
df.groupby('column1').sum()
Upvotes: 1