Reputation: 2492
So I've essentially got this:
, pct_intl_student
2879 %
2880 9%
2881 NaN
2882 1%
2883 NaN
Name: pct_intl_student, Length: 2884, dtype: object
Would it be possible in some easy way to change all the strings with a percent sign in them to a decimal number? So basically this:
, pct_intl_student
2979 0
2880 0.09
2881 NaN
2882 0.01
2883 NaN
Name: pct_intl_student, Length: 2884, dtype: object
I do need the NaN values to stay in place, they will be converted to the average percentage number afterwards. The thing also is that NaN values should all stay as NaN, and the rows with merely the string '%' needs to become 0.
I tried:
df['pct_intl_student'] = df['pct_intl_student'].str.rstrip('%').astype('float') / 100.0
But this raises this error:
ValueError: could not convert string to float:
So I'm kindof at a loss right now
Hopefully someone can help me out.
Upvotes: 8
Views: 11754
Reputation: 43494
Here is an example that better describes your issue:
df = pd.DataFrame({"a": ["9%", "10%", np.nan, '%']})
print(df)
# a
#0 9%
#1 10%
#2 NaN
#3 %
You want the string %
to turn into the value 0
.
One way is to change your code to use str.replace
instead of str.strip
. Here I will replace the %
s with .0
df['a'].str.replace(r'%', r'.0').astype('float') / 100.0
#0 0.09
#1 0.10
#2 NaN
#3 0.00
#Name: a, dtype: float64
Upvotes: 7
Reputation: 153460
Update:
df['pct_intl_student'] = (pd.to_numeric(df['pct_intl_student'].str[:-1])
.div(100)
.mask(df['pct_intl_student'] == '%', 0))
Output:
pct_intl_student
2879 0.00
2880 0.09
2881 NaN
2882 0.01
2883 NaN
Use:
df['pct_intl_student'] = pd.to_numeric(df['pct_intl_student'].str.strip('%')).div(100)
Or
df['pct_intl_student'] = pd.to_numeric(df['pct_intl_student'].str[:-1]).div(100)
Output:
2880 0.09
2881 NaN
2882 0.01
2883 NaN
Name: pct_intl_student, dtype: float64
Upvotes: 7