rAmAnA
rAmAnA

Reputation: 2031

pandas DataFrame sum method works counterintuitively

my_df = DataFrame(np.arange(1,13).reshape(4,3), columns=list('abc'))

my_df.sum(axis="rows")   

O/P is

a 22

b 26

c 30

// I expect it to sum by rows thereby giving

0 6

1 15

2 24

3 33

my_df.sum(axis="columns") //helps achieve this

Why does it work counterintutively? In a similar context, drop method works as it should i.e when i write

my_df.drop(['a'],axis="columns") 

// This drops column "a".

Am I missing something? Please enlighten.

Upvotes: 2

Views: 67

Answers (1)

Anton vBR
Anton vBR

Reputation: 18906

Short version

It is a naming convention. The sum of the columns gives a row-wise sum. You are looking for axis='columns').


Long version

Ok that was interesting. In pandas normally 0 is for columns and 1 is for rows. However looking in the docs we find that the allowed params are:

axis : {index (0), columns (1)}

You are passing a param that does not exist which results in the default. This can thus be read as: The sum of the columns returns the row sum. The sum of the index returns the column sum. What you want to use it axis=1 or axis='columns' which results in your desired output:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(1,13).reshape(4,3), columns=list('abc'))

print(df.sum(axis=1))

Returns:

0     6
1    15
2    24
3    33
dtype: int64

Upvotes: 1

Related Questions