Reputation: 27
I have some difficulties to exploit csv scraping file in pandas. I have several columns, one of them contain prices as '1 800 €'
After to import csv as dataframe, I can not convert my columns in Integrer
I deleted euro symbol without problem
data['prix']= data['prix'].str.strip('€')
I tried to delete space with the same approach, but the space still remaied
data['prix']= data['prix'].str.strip()
or
data['prix']= data['prix'].str.strip(' ')
or
data['prix']= data['prix'].str.replace(' ', '')
I tried to force the conversion in Int
data['prix']= pd.to_numeric(data['prix'], errors='coerce')
My column was fill by Nan value
I tried to convert before operation of replace space in string
data = data.convert_dtypes(convert_string=True)
But same result : impossible to achieve my aim the spaces are always present and I can not convert in integer
I looked with Excel into dataset, I can not identify special problem in the data
I tried also to change encoding standard in read_csv ... ditto
In this same dataset I had the same problem for the kilometrage as 15 256 km And I had no problem to retreat and convert to int ...
I would like to test through REGEX to copy only numbers of the field et create new column with
How to proceed ? I am also interested by other ideas
Thank you
Upvotes: 2
Views: 1062
Reputation: 762
To delete the white space use this line:
data['prix']= data['prix'].str.replace(" ","")
and to convert the string into a int use this line:
data['prix'] = [int(i) for i in data['prix']]
Upvotes: 0
Reputation: 120409
Use str.findall
:
I would like to test through REGEX to copy only numbers of the field et create new column with
data['prix2'] = data['prix'].str.findall(r'\d+').str.join('').astype(int)
# Or if it raises an exception
data['prix2'] = pd.to_numeric(data['prix'].str.findall('(\d+)').str.join(''), errors='coerce')
Upvotes: 1