Reputation: 87
I want to add a new column called '2016 Salary ($)' that contains employee pay from a table Salary Paid as a number, to the DataFrame income. I want to strip that number by removing '$'and ','.
But when I am doing this I get the error saying:
'Could not convert string to float'
I try to follow the hint, but it is not working:
income['2016 Salary ($)']= income['SalaryPaid'].str.strip('$').astype(float)
income['2016 Salary ($)'].apply(lambda X:X['Salary Paid'])
income
Upvotes: 2
Views: 551
Reputation: 11
i have created a dummy dataframe as per your requirement and have performed the same operation as you had mentioned above and it worked fine for me.
import pandas as pd
df = pd.DataFrame(columns=['AA','BB'])
df['AA'] = ['$12,20','$13,30']
df['BB'] = ['X','Y']
print(df)
Output -----> AA BB 0 $12,20 X 1 $13,30 Y
df['AA'] = df['AA'].str.replace('$','').str.replace(',','').astype(float)
print(df)
Output -----> AA BB 0 1220.0 X 1 1330.0 Y
According to me the error is in second line of your code where you are trying to apply lambda, instead of "income['2016 Salary ($)'].apply(lambda X:X['Salary Paid'])" it should be "income['2016 Salary ($)'].apply(lambda X:X['SalaryPaid'])". I think there is a typo error with column named SalaryPaid.
Upvotes: 1
Reputation: 4618
can also do:
def convert(x):
return float(x.replace('$','').replace(',',''))
income['2016 Salary ($)'] = income['Salary Paid'].apply(convert)
or
def convert(x):
return float(''.join(re.findall('[\d+\.]',x)))
Upvotes: 0
Reputation: 1298
Try something like this :
Data :
dic = {'Name':['John','Peter'],'SalaryPaid':['$204,546,289.35','$500,231,289.35'],'Year':['2008','2009']}
df1 = pd.DataFrame(dic)
df1
Name SalaryPaid Year
0 John $204,546,289.35 2008
1 Peter $500,231,289.35 2009
Code:
df1['SalaryPaid'] = df1['SalaryPaid'].str.replace(',', '')
# If you want the result as a string :
df1['2016 Salary ($)']= df1['SalaryPaid'].str.strip('$')
# if you want the result as float :
#df1['2016 Salary ($)']= df1['SalaryPaid'].str.strip('$').astype(float)
df1
Result:
Name SalaryPaid Year 2016 Salary ($)
0 John $204546289.35 2008 204546289.35
1 Peter $500231289.35 2009 500231289.35
Upvotes: 2
Reputation: 862406
Add Series.str.replace
first:
income['2016 Salary ($)']= income['SalaryPaid'].str.replace(',', '')
.str.strip('$')
.astype(float)
Or better solution if create DataFrame
from file is use thousands
parameter in read_csv
:
income = pd.read_csv(file, thousands=',')
income['2016 Salary ($)']= income['SalaryPaid'].str.strip('$').astype(float)
Upvotes: 2