python - removing whitespaces not working - encoding issue?

Question

I have a dataset which contains data scraped from a real estate agency. I want to clean the variables from special characters (€, m², etc), remove whitespaces and convert them into integers. For instance I have these for the price (Prix de vente) and size (Surface habitable):

I manage to strip the "€" from the price variable with the following:

data["Prix de vente"] = data["Prix de vente"].str.strip("€")

However none of these two techniques work to remove the whitespaces

data["Prix de vente"] = data["Prix de vente"].str.strip() 
data["Prix de vente"] = data["Prix de vente"].replace(" ","")

I keep having:

I have a feeling this is due to the way my data is encoded. Would that be the issue? And if so, how do I deal with it? I'm using a jupyter notebook.

Peter · Accepted Answer

You can do

data["Prix de vente"] = data["Prix de vente"].str.replace("\s","", regex=True)

Matching "\s" combined with regex=True instead of " " makes sure to match not only regular spaces but any whitespace character

More background:

Your first approach

data["Prix de vente"] = data["Prix de vente"].str.strip()

does not work, since .strip() only removes leading and trailing characters.

Your second approach

data["Prix de vente"] = data["Prix de vente"].replace(" ","")

does not work, since it is using not str.replace, but pd.Series.replace, which only replaces exact matches of values (e.g. "900 000").

python - removing whitespaces not working - encoding issue?

Answers (2)

Related Questions