pandas: remove dot only if it occurs after a digit in a string

Question

I have a datframe that looks like the following:

df=  pd.DataFrame(["I", "have", "5.", "apples", "."]
                 columns=['words'])

and I only want the dot following the number to be removed and not the dot at the end of the sentence. (5. --> 5)

I tried

df["Words"].str.replace("\d.", "\d", regex=True)

but it sends an error.

CDJB · Accepted Answer

The following should work - we need to use a capturing group in the regex so we know what value should replace the initial value. In addition, we need to use a raw-string literal to escape the backslashes in the regex string.

>>> df = pd.DataFrame(["I", "have", "5.", "apples", "."],
                  columns=['words'])
>>> df["words"].str.replace(r"(\d)\.", r"\1")
0         I
1      have
2         5
3    apples
4         .
Name: words, dtype: object

pandas: remove dot only if it occurs after a digit in a string

Answers (2)

Related Questions