Replacing unicode from a text in a pandas dataframe

Question

I have this dataframe:

>>> df
                   Temp
0   before 1.5° C after
1     before 2° C after
2    before 2°  C after
3  before 1.5°  C after

I apply this replace method:

newdf = df.replace(r'(?P\d[.]*[\d]*)(?u:00B0)\s+C', '(?P=quote)'r'C')

The dataframe remains unchanged. However, I want it to look like this:

>>> newdf
               Temp
0 before 1.5C after
1   before 2C after
2   before 2C after
3 before 1.5C after

I've also tried newdf = df.replace(r'°\s+','') but that also doesn't change the dataframe.

These other questions:

removing unicode from text in pandas I don't want to remove all unicode characters, just this one when followed by zero or more spaces.
Replacing Unicode character in pandas Dataframe column, but I need the regex for zero or more spaces.
I can't just remove the unicode character first, because ° is the indicator for where a change needs to happen.

Rakesh · Accepted Answer

Using pattern r"[^\d.C]" to replace every thing except int, decimal, and C

Ex:

df["New"] = df["Temp"].str.replace(r"[^\d.C]", "")
#OR
df["New"] = df["Temp"].str.replace(r"(?<=\d)(°\s*)", "")
print(df)

Output:

      Temp   New
0   1.5° C  1.5C
1     2° C    2C
2     2° C    2C
3  1.5°  C  1.5C

Answers (2)