David Collins
David Collins

Reputation: 900

Replacing unicode from a text in a pandas dataframe

I have this dataframe:

>>> df
                   Temp
0   before 1.5° C after
1     before 2° C after
2    before 2°  C after
3  before 1.5°  C after

I apply this replace method:

newdf = df.replace(r'(?P<quote>\d[.]*[\d]*)(?u:00B0)\s+C', '(?P=quote)'r'C')

The dataframe remains unchanged. However, I want it to look like this:

>>> newdf
               Temp
0 before 1.5C after
1   before 2C after
2   before 2C after
3 before 1.5C after

I've also tried newdf = df.replace(r'°\s+','') but that also doesn't change the dataframe.

These other questions:

Upvotes: 0

Views: 915

Answers (2)

wwnde
wwnde

Reputation: 26676

Another way; Just replace all non digits excluding .

 df["New"]=df.Temp.str.replace('[^\w\.]','')

Upvotes: 0

Rakesh
Rakesh

Reputation: 82785

Using pattern r"[^\d.C]" to replace every thing except int, decimal, and C

Ex:

df["New"] = df["Temp"].str.replace(r"[^\d.C]", "")
#OR
df["New"] = df["Temp"].str.replace(r"(?<=\d)(°\s*)", "")
print(df)

Output:

      Temp   New
0   1.5° C  1.5C
1     2° C    2C
2     2° C    2C
3  1.5°  C  1.5C

Upvotes: 1

Related Questions