PEREZje
PEREZje

Reputation: 2492

Change column with string of percent to float pandas dataframe

So I've essentially got this:

,    pct_intl_student
2879      %
2880     9%
2881    NaN
2882     1%
2883    NaN
Name: pct_intl_student, Length: 2884, dtype: object

Would it be possible in some easy way to change all the strings with a percent sign in them to a decimal number? So basically this:

,    pct_intl_student
2979    0
2880    0.09
2881    NaN
2882    0.01
2883    NaN
Name: pct_intl_student, Length: 2884, dtype: object

I do need the NaN values to stay in place, they will be converted to the average percentage number afterwards. The thing also is that NaN values should all stay as NaN, and the rows with merely the string '%' needs to become 0.

I tried:

df['pct_intl_student'] = df['pct_intl_student'].str.rstrip('%').astype('float') / 100.0

But this raises this error:

ValueError: could not convert string to float:

So I'm kindof at a loss right now

Hopefully someone can help me out.

Upvotes: 8

Views: 11754

Answers (2)

pault
pault

Reputation: 43494

Here is an example that better describes your issue:

df = pd.DataFrame({"a": ["9%", "10%", np.nan, '%']})
print(df)
#     a
#0   9%
#1  10%
#2  NaN
#3    %

You want the string % to turn into the value 0.

One way is to change your code to use str.replace instead of str.strip. Here I will replace the %s with .0

df['a'].str.replace(r'%', r'.0').astype('float') / 100.0
#0    0.09
#1    0.10
#2     NaN
#3    0.00
#Name: a, dtype: float64

Upvotes: 7

Scott Boston
Scott Boston

Reputation: 153460

Update:

df['pct_intl_student'] = (pd.to_numeric(df['pct_intl_student'].str[:-1])
                            .div(100)
                            .mask(df['pct_intl_student'] == '%', 0))

Output:

      pct_intl_student
2879              0.00
2880              0.09
2881               NaN
2882              0.01
2883               NaN

Use:

df['pct_intl_student'] = pd.to_numeric(df['pct_intl_student'].str.strip('%')).div(100)

Or

df['pct_intl_student'] = pd.to_numeric(df['pct_intl_student'].str[:-1]).div(100)

Output:

2880    0.09
2881     NaN
2882    0.01
2883     NaN
Name: pct_intl_student, dtype: float64

Upvotes: 7

Related Questions