Reputation: 67
I have a datframe that looks like the following:
df= pd.DataFrame(["I", "have", "5.", "apples", "."]
columns=['words'])
and I only want the dot following the number to be removed and not the dot at the end of the sentence. (5. --> 5)
I tried
df["Words"].str.replace("\d.", "\d", regex=True)
but it sends an error.
Upvotes: 3
Views: 1529
Reputation: 14506
The following should work - we need to use a capturing group in the regex so we know what value should replace the initial value. In addition, we need to use a raw-string literal to escape the backslashes in the regex string.
>>> df = pd.DataFrame(["I", "have", "5.", "apples", "."],
columns=['words'])
>>> df["words"].str.replace(r"(\d)\.", r"\1")
0 I
1 have
2 5
3 apples
4 .
Name: words, dtype: object
Upvotes: 2
Reputation: 473
We need
df["words"].str.replace(r"^(\d+)\.$", r"\1")
This matches longer digits as well and makes sure the last character is a dot instead of anything.
The answer of CDJB is not entirely correct:
df = pd.DataFrame(["I", "have", "50a", "apples", "."],
columns=['words'])
[ins] In [12]: df["words"].str.replace(r"(\d).", r"\1")
Out[12]:
0 I
1 have
2 5a
3 apples
4 .
Name: words, dtype: object
Upvotes: 1