Reputation: 3836
So say I have the following table:
In [2]: df = pd.DataFrame({'a': [1,2,3], 'b':[2,4,6], 'c':[1,1,1]})
In [3]: df
Out[3]:
a b c
0 1 2 1
1 2 4 1
2 3 6 1
I can sum a and b that way:
In [4]: sum(df['a']) + sum(df['b'])
Out[4]: 18
However this is not very convenient for larger dataframe, where you have to sum multiple columns together.
Is there a neater way to sum columns (similar to the below)? What if I want to sum the entire DataFrame without specifying the columns?
In [4]: sum(df[['a', 'b']]) #that will not work!
Out[4]: 18
In [4]: sum(df) #that will not work!
Out[4]: 21
Upvotes: 15
Views: 40403
Reputation: 486
Maybe you are looking something like this:
df["result"] = df.apply(lambda row: row['a' : 'c'].sum(),axis=1)
Upvotes: 1
Reputation: 862481
I think you can use double sum
- first DataFrame.sum
create Series
of sums and second Series.sum
get sum of Series
:
print (df[['a','b']].sum())
a 6
b 12
dtype: int64
print (df[['a','b']].sum().sum())
18
You can also use:
print (df[['a','b']].sum(axis=1))
0 3
1 6
2 9
dtype: int64
print (df[['a','b']].sum(axis=1).sum())
18
Thank you pirSquared for another solution - convert df
to numpy array
by values
and then sum
:
print (df[['a','b']].values.sum())
18
print (df.sum().sum())
21
Upvotes: 23