Reputation: 5455
I have a DataFrame with numerical values. What is the simplest way of appending a row (with a given index value) that represents the sum of each column?
Upvotes: 94
Views: 149507
Reputation: 2147
For those that have trouble because the result is 0
or NaN
, check dtype
first.
df.dtypes
Since sum can only process numeric try to change the type of your dataframe first. In this example, chang to int32
for integer.
df = df.astype('int32')
df.dtypes
Then, you should be able to sum across row and add new column (as the accepted answer, not the question).
df['sum']= df.sum(numeric_only=True,axis=1)
Bonus: Sort the sum column
df.sort_values(by=['sum'])
Upvotes: 0
Reputation: 17164
This gives total on both rows and columns:
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': [10,20],'b':[100,200],'c': ['a','b']})
df.loc['Column_Total']= df.sum(numeric_only=True, axis=0)
df.loc[:,'Row_Total'] = df.sum(numeric_only=True, axis=1)
print(df)
a b c Row_Total
0 10.0 100.0 a 110.0
1 20.0 200.0 b 220.0
Column_Total 30.0 300.0 NaN 330.0
Upvotes: 56
Reputation: 1989
I did not find the modern pandas approach! This solution is a bit dirty due to two chained transposition, I do not know how to use .assign
on rows.
# Generate DataFrame
import pandas as pd
df = pd.DataFrame({'a': [10,20],'b':[100,200],'c': ['a','b']})
# Solution
df.T.assign(Total = lambda x: x.sum(axis=1)).T
output:
a b c Total
0 10 100 a 110
1 20 200 b 220
Upvotes: 0
Reputation: 788
new_sum_col = list(df.sum(axis=1))
df['new_col_name'] = new_sum_col
Upvotes: 1
Reputation: 1243
To add a Total
column which is the sum across the row:
df['Total'] = df.sum(axis=1)
Upvotes: 111
Reputation: 602
You can use the append
method to add a series with the same index as the dataframe to the dataframe. For example:
df.append(pd.Series(df.sum(),name='Total'))
Upvotes: 5
Reputation: 61
I have done it this way:
df = pd.concat([df,pd.DataFrame(df.sum(axis=0),columns=['Grand Total']).T])
this will add a column of totals for each row:
df = pd.concat([df,pd.DataFrame(df.sum(axis=1),columns=['Total'])],axis=1)
It seems a little annoying to have to turn the Series
object (or in the answer above, dict
) back into a DataFrame and then append it, but it does work for my purpose.
It seems like this should just be a method of the DataFrame
- like pivot_table has margins.
Perhaps someone knows of an easier way.
Upvotes: 5
Reputation: 186
One way is to create a DataFrame with the column sums, and use DataFrame.append(...). For example:
import numpy as np
import pandas as pd
# Create some sample data
df = pd.DataFrame({"A": np.random.randn(5), "B": np.random.randn(5)})
# Sum the columns:
sum_row = {col: df[col].sum() for col in df}
# Turn the sums into a DataFrame with one row with an index of 'Total':
sum_df = pd.DataFrame(sum_row, index=["Total"])
# Now append the row:
df = df.append(sum_df)
Upvotes: 8