jenna
jenna

Reputation: 87

Cannot convert string column to float in pandas

I want to add a new column called '2016 Salary ($)' that contains employee pay from a table Salary Paid as a number, to the DataFrame income. I want to strip that number by removing '$'and ','.

But when I am doing this I get the error saying:

'Could not convert string to float'

I try to follow the hint, but it is not working:

income['2016 Salary ($)']= income['SalaryPaid'].str.strip('$').astype(float)
income['2016 Salary ($)'].apply(lambda X:X['Salary Paid'])
income

Upvotes: 2

Views: 551

Answers (4)

Subham Gupta
Subham Gupta

Reputation: 11

i have created a dummy dataframe as per your requirement and have performed the same operation as you had mentioned above and it worked fine for me.

import pandas as pd
df = pd.DataFrame(columns=['AA','BB'])
df['AA'] = ['$12,20','$13,30']
df['BB'] = ['X','Y']
print(df)

Output -----> AA BB 0 $12,20 X 1 $13,30 Y

df['AA'] = df['AA'].str.replace('$','').str.replace(',','').astype(float)
print(df)

Output -----> AA BB 0 1220.0 X 1 1330.0 Y

According to me the error is in second line of your code where you are trying to apply lambda, instead of "income['2016 Salary ($)'].apply(lambda X:X['Salary Paid'])" it should be "income['2016 Salary ($)'].apply(lambda X:X['SalaryPaid'])". I think there is a typo error with column named SalaryPaid.

Upvotes: 1

Derek Eden
Derek Eden

Reputation: 4618

can also do:

def convert(x):
    return float(x.replace('$','').replace(',',''))

income['2016 Salary ($)'] = income['Salary Paid'].apply(convert)

or

def convert(x):
    return float(''.join(re.findall('[\d+\.]',x)))

Upvotes: 0

O.O
O.O

Reputation: 1298

Try something like this :

Data :

dic = {'Name':['John','Peter'],'SalaryPaid':['$204,546,289.35','$500,231,289.35'],'Year':['2008','2009']}
df1 = pd.DataFrame(dic)
df1

    Name    SalaryPaid      Year
0   John    $204,546,289.35 2008
1   Peter   $500,231,289.35 2009

Code:

df1['SalaryPaid'] = df1['SalaryPaid'].str.replace(',', '')
# If you want the result as a string : 
df1['2016 Salary ($)']= df1['SalaryPaid'].str.strip('$')
# if you want the result as float : 
#df1['2016 Salary ($)']= df1['SalaryPaid'].str.strip('$').astype(float) 


df1

Result:

    Name    SalaryPaid  Year    2016 Salary ($)
0   John    $204546289.35   2008    204546289.35
1   Peter   $500231289.35   2009    500231289.35

Upvotes: 2

jezrael
jezrael

Reputation: 862406

Add Series.str.replace first:

income['2016 Salary ($)']= income['SalaryPaid'].str.replace(',', '')
                                               .str.strip('$')
                                               .astype(float)

Or better solution if create DataFrame from file is use thousands parameter in read_csv:

income = pd.read_csv(file, thousands=',')

income['2016 Salary ($)']= income['SalaryPaid'].str.strip('$').astype(float)

Upvotes: 2

Related Questions