Reputation: 463
I have a pandas dataframe as per example:
mydf.head()
Date Merchant/Description Debit/Credit
0 10/05/2018 FAKE TRANSACTION 1 -£7.50
1 09/05/2018 FAKE TRANSACTION 2 -£5.79
2 09/05/2018 FAKE TRANSACTION 3 -£28.50
3 08/05/2018 FAKE TRANSACTION 4 -£3.99
4 08/05/2018 FAKE TRANSACTION 5 -£17.99
The column ['Debit/Credit'] has data type 'object'; it is a mixture of strings and NaN.
I wish to convert the strings to numbers. I use pandas.to_numeric to try to achieve this:
cols = ['Debit/Credit']
hsbcraw[cols] = hsbcraw[cols].apply(pd.to_numeric, errors='coerce')
This is turning all items in the column ['Debit/Credit'] to NaN:
mydf.head()
Date Merchant/Description Debit/Credit
0 10/05/2018 FAKE TRANSACTION 1 NaN
1 09/05/2018 FAKE TRANSACTION 2 NaN
2 09/05/2018 FAKE TRANSACTION 3 NaN
3 08/05/2018 FAKE TRANSACTION 4 NaN
4 08/05/2018 FAKE TRANSACTION 5 NaN
What is the error in my code or my approach?
Upvotes: 0
Views: 2002
Reputation: 6639
I usually do it like by converting to floats like this:
df['Debit/Credit'] = df['Debit/Credit'].replace('£', '', regex = True).astype('float')
Upvotes: 0
Reputation: 82755
You can also use regex
.
Ex:
import pandas as pd
df = pd.DataFrame({"Debit/Credit": ["-£7.50", "-£5.79", "-£28.50", "-£3.99", "-£17.99"]})
df["Debit/Credit"] = df["Debit/Credit"].str.extract("(\d*\.\d+)", expand=True).apply(pd.to_numeric)
print(df)
Output:
Debit/Credit
0 7.50
1 5.79
2 28.50
3 3.99
4 17.99
Upvotes: 0