stackoverflowuser2010
stackoverflowuser2010

Reputation: 40899

Python string replace with regex substitution variable

I have the following string, where the substring 2.5 was incorrectly formed: 'It costs 2. 5. That is a lot.'

How do I remove the space between the 2. and the 5?

I tried:

s = 'It costs 2. 5. That is a lot.'
s = s.replace('. ', '.')
print(s) # It costs 2.5.That is a lot.

However, that also remove the correctly-placed space between the 5. and T. I think I'm looking for a sed-style regex substitution variable, like s/\. \([0-9]\)/.\1/g. How do I do that in Python?

Upvotes: 2

Views: 145

Answers (3)

The fourth bird
The fourth bird

Reputation: 163277

In case the string after it can start with a digit, you can match the dot after the second digit as well.

If you don't want to match newlines in between, you can match all whitespace chars without a newline.

\b(\d+\.)[^\S\r\n]+(\d+\.)

Explanation

  • \b A word boundary
  • (\d+\.) Capture group 1, match 1+ digits and a dot
  • [^\S\r\n]+ Match 1+ whitespace chars without a newline
  • (\d+\.) Capture group 2, match 1+ digits and a following dot

Regex demo | Python demo

In the replacement use group 1 and group 2.

For example

import re
s = ("It costs 2. 5. That is a lot.\n"
    "It costs 2. 5 items, that is a lot.")
pattern = r"\b(\d+\.)[^\S\r\n]+(\d+\.)"
print(re.sub(pattern, r"\1\2", s))

Output

It costs 2.5. That is a lot.
It costs 2. 5 items, that is a lot.

Upvotes: 1

Robo Mop
Robo Mop

Reputation: 3553

How about this:

(?<=\d\.)\s+(?=\d)

As seen here at regex101.com

Explanation

I'm making use of positive lookbehinds and lookaheads in regex, which tell the regex to match one or more spaces \s+ which are preceded by a digit and a period (given by (?<=\d\.)), and followed by a digit (given by (?=\d))

Here's a link to learn more about lookaheads and lookbehinds. They're incredibly useful in so many problems, so I suggest you learn more about them.


Python Implementation

import re

s = 'It costs 2. 5. That is a lot.'
s = re.sub(r"(?<=\d\.)\s+(?=\d)", "", s)

Upvotes: 2

Selcuk
Selcuk

Reputation: 59184

You can use a regex:

>>> import re
>>> re.sub("(\d+). (\d+)", "\g<1>.\g<2>", s)
'It costs 2.5. That is a lot.'

Upvotes: 1

Related Questions