AliReza
AliReza

Reputation: 37

Pandas : Changing a column of dataset from string to integer

One Column of my dataset is like this:

0        10,000+
1       500,000+
2     5,000,000+
3    50,000,000+
4       100,000+
Name: Installs, dtype: object

and I want to change these 'xxx,yyy,zzz+' strings to integers. first I tried this function:

df['Installs'] = pd.to_numeric(df['Installs'])

and I got this error:

ValueError: Unable to parse string "10,000" at position 0

and then I tried to remove '+' and ',' with this method:

df['Installs'] = df['Installs'].str.replace('+','',regex = True)
df['Installs'] = df['Installs'].str.replace(',','',regex = True)

but nothing changed!

How can I convert these strings to integers?

Upvotes: 0

Views: 734

Answers (2)

mozway
mozway

Reputation: 260640

+ is not a valid regex, use:

df['Installs'] = pd.to_numeric(df['Installs'].str.replace(r'\D', '', regex=True))

Upvotes: 2

user17242583
user17242583

Reputation:

With regex=True, the + (plus) character is interepreted specially, as a regex feature. You can either disable regular expression replacement (regex=False), or even better, change your regular expression to match + or , and remove them at once:

df['Installs'] = df['Installs'].str.replace('[+,]', '', regex=True).astype(int)

Output:

>>> df['Installs']
0       10000
1      500000
2     5000000
3    50000000
4      100000
Name: 0, dtype: int64

Upvotes: 3

Related Questions