cleaning numeric columns in pandas

Question

I have some difficulties to exploit csv scraping file in pandas. I have several columns, one of them contain prices as '1 800 €'

After to import csv as dataframe, I can not convert my columns in Integrer

I deleted euro symbol without problem

data['prix']= data['prix'].str.strip('€')

I tried to delete space with the same approach, but the space still remaied

data['prix']= data['prix'].str.strip()
or
data['prix']= data['prix'].str.strip(' ')
or
data['prix']= data['prix'].str.replace(' ', '')

I tried to force the conversion in Int

data['prix']= pd.to_numeric(data['prix'], errors='coerce')

My column was fill by Nan value

I tried to convert before operation of replace space in string

data = data.convert_dtypes(convert_string=True)

But same result : impossible to achieve my aim the spaces are always present and I can not convert in integer

I looked with Excel into dataset, I can not identify special problem in the data

I tried also to change encoding standard in read_csv ... ditto

In this same dataset I had the same problem for the kilometrage as 15 256 km And I had no problem to retreat and convert to int ...

I would like to test through REGEX to copy only numbers of the field et create new column with

How to proceed ? I am also interested by other ideas

Thank you

Corralien · Accepted Answer

Use str.findall:

I would like to test through REGEX to copy only numbers of the field et create new column with

data['prix2'] = data['prix'].str.findall(r'\d+').str.join('').astype(int)

# Or if it raises an exception

data['prix2'] = pd.to_numeric(data['prix'].str.findall('(\d+)').str.join(''), errors='coerce')

cleaning numeric columns in pandas

Answers (2)

Related Questions