tina.mario
tina.mario

Reputation: 67

pandas: remove dot only if it occurs after a digit in a string

I have a datframe that looks like the following:

df=  pd.DataFrame(["I", "have", "5.", "apples", "."]
                 columns=['words'])

and I only want the dot following the number to be removed and not the dot at the end of the sentence. (5. --> 5)

I tried

df["Words"].str.replace("\d.", "\d", regex=True)

but it sends an error.

Upvotes: 3

Views: 1529

Answers (2)

CDJB
CDJB

Reputation: 14506

The following should work - we need to use a capturing group in the regex so we know what value should replace the initial value. In addition, we need to use a raw-string literal to escape the backslashes in the regex string.

>>> df = pd.DataFrame(["I", "have", "5.", "apples", "."],
                  columns=['words'])
>>> df["words"].str.replace(r"(\d)\.", r"\1")
0         I
1      have
2         5
3    apples
4         .
Name: words, dtype: object

Upvotes: 2

sampers
sampers

Reputation: 473

We need

df["words"].str.replace(r"^(\d+)\.$", r"\1")

This matches longer digits as well and makes sure the last character is a dot instead of anything.

The answer of CDJB is not entirely correct:

df = pd.DataFrame(["I", "have", "50a", "apples", "."],
                  columns=['words'])
[ins] In [12]: df["words"].str.replace(r"(\d).", r"\1")
Out[12]:
0         I
1      have
2        5a
3    apples
4         .
Name: words, dtype: object

Upvotes: 1

Related Questions