Reputation: 99
I have a dataframe (called df) that currently looks like this:
Date Amount
01/11/2019 -0.4
01/11/2019 -15.81
01/11/2019 -21.98
31/10/2019 -5.27
30/10/2019 -1.5
30/10/2019 -20
30/10/2019 -5,000
I would like to sum the column "Amount" up. To do so, I have taken the following steps:
df['Amount'] = df['Amount'].str.replace(',', '')
pd.to_numeric(df['Amount'])
df['Amount'].sum()
However, when I try to sum it, I get a string, even though the column "Amount" is clearly a float:
'-0.4-15.81-21.98-5.27-1.5-20-5000'
Does anyone have any advice on how to solve this? I've been stuck on this for a while!
Thank you!
Upvotes: 3
Views: 498
Reputation: 2960
There is actually a thousand
argument that can help you convert all the values into numeric. see a mockup below. Let me know if it works.
from StringIO import StringIO
Mydata = StringIO("""Date Amount
01/11/2019 -0.4
01/11/2019 -15.81
01/11/2019 -21.98
31/10/2019 -5.27
30/10/2019 -1.5
30/10/2019 -20
30/10/2019 -5,000
""")
df = pd.read_csv(Mydata, sep=" ",engine='python', thousands=',')
df
result below:
Date Amount
0 01/11/2019 -0.40
1 01/11/2019 -15.81
2 01/11/2019 -21.98
3 31/10/2019 -5.27
4 30/10/2019 -1.50
5 30/10/2019 -20.00
6 30/10/2019 -5000.00
Upvotes: 0
Reputation: 173
When you do pd.to_numeric(df['Amount'])
, it converts the column 'Amount' to numeric, but does not replace the values in the actual column. The modified (or converted) column is stored in the '_' variable.
You need to include df['Amount'] = pd.to_numeric(df['Amount'])
to replace the actual column in the dataframe.
Upvotes: 0
Reputation: 1469
Use direct sum operation of what pandas offer. Axis is showing column index.
df.sum(axis = 1, skipna = True)
skipna
is for skip NaN columns.
Upvotes: 0
Reputation: 962
You are almost there, only need to change this line:
df['Amount'] = pd.to_numeric(df['Amount'])
Upvotes: 2