Reputation: 37
One Column of my dataset is like this:
0 10,000+
1 500,000+
2 5,000,000+
3 50,000,000+
4 100,000+
Name: Installs, dtype: object
and I want to change these 'xxx,yyy,zzz+' strings to integers. first I tried this function:
df['Installs'] = pd.to_numeric(df['Installs'])
and I got this error:
ValueError: Unable to parse string "10,000" at position 0
and then I tried to remove '+' and ',' with this method:
df['Installs'] = df['Installs'].str.replace('+','',regex = True)
df['Installs'] = df['Installs'].str.replace(',','',regex = True)
but nothing changed!
How can I convert these strings to integers?
Upvotes: 0
Views: 734
Reputation: 260640
+
is not a valid regex, use:
df['Installs'] = pd.to_numeric(df['Installs'].str.replace(r'\D', '', regex=True))
Upvotes: 2
Reputation:
With regex=True
, the +
(plus) character is interepreted specially, as a regex feature. You can either disable regular expression replacement (regex=False
), or even better, change your regular expression to match +
or ,
and remove them at once:
df['Installs'] = df['Installs'].str.replace('[+,]', '', regex=True).astype(int)
Output:
>>> df['Installs']
0 10000
1 500000
2 5000000
3 50000000
4 100000
Name: 0, dtype: int64
Upvotes: 3