marin
marin

Reputation: 953

Replacing problems in string

I have a problem to replacings in a string: I want to change all the appearances from 2h / 2h / 2 heure / 2heure / 2 heures / 2heures to #hour. I tried:

text = "I should leave the house at 16h45 but I am late and I should not be arriving between 2 h or 3h or maybe 4heures"
hour = re.compile(r'[0-9]+\s?(h|heures?)([0-9]+)?')
replaces = hour.sub('#hour', text)
print(replaces)

Output:

I should leave the house at #hour but I am late and I should not be arriving between #hour or #hour or maybe #houreures

Good output:

I should leave the house at #hour but I am late and I should not be arriving between #hour or #hour or maybe #hour

How could I solve this problem #houreures?

Upvotes: 2

Views: 57

Answers (4)

The fourth bird
The fourth bird

Reputation: 163277

You need to switch the alternation because the h in the first part gets matched first.

In for example 4heures, your regex matches one or more times a digit \d+. Then in the alternation (h|heures?) it can match the h from heures. In the replacement the matched 4h will be replaced with #hour resulting in #houreures

[0-9]+\s?(heures?|h)([0-9]+)?

import re

text = "I should leave the house at 16h45 but I am late and I should not be arriving between 2 h or 3h or maybe 4heures"
hour = re.compile(r'[0-9]+\s?(heures?|h)([0-9]+)?')
replaces = hour.sub('#hour', text)
print(replaces)

Demo

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195418

Online demo here.

import re

text = "I should leave the house at 16h45 but I am late and I should not be arriving between 2 h or 3h or maybe 4heures"

s = re.sub(r'\d+\s*[h]?(eure)*[s]?\d*', '#hour', text)
print(s)

Output:

I should leave the house at #hour but I am late and I should not be arriving between #hour or #hour or maybe #hour

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

The h alternative matched the h in heures and heures? alternative was not even tested. Swapping the alternatives can solve the problem, but it is a better idea to use an optional non-capturing group instead (see solution below).

There is no need in the capturing parentheses in the pattern, I suggest removing them (or, if you want to use alternation, convert to a non-capturing group).

Besides, the ([0-9]+)? pattern can be simplified to [0-9]*.

You may use

[0-9]+\s?h(?:eures?)?[0-9]*

See the regex demo

Details

  • [0-9]+ - one or more digits
  • \s? - 1 or 0 whitespaces
  • h - a h letter
  • (?:eures?)? - an optional non-capturing group that matches 1 or 0 occurrences of eure or eures
  • [0-9]* - 0 or more digits.

See the Python demo:

import re
text = "I should leave the house at 16h45 but I am late and I should not be arriving between 2 h or 3h or maybe 4heures"
hour = re.compile(r'[0-9]+\s?h(?:eures?)?[0-9]*')
replaces = hour.sub('#hour', text)
print(replaces)
# => I should leave the house at #hour but I am late and I should not be arriving between #hour or #hour or maybe #hour

Upvotes: 2

Wololo
Wololo

Reputation: 861

Change the ordering of heures and h inside the parenthesis, like this:

[0-9]+\s?(heures?|h)([0-9]+)? should work.

In case of (h|heures?), you are saying that if h is not found then see if heures is present. The thing is, whenever heures is present, h will always be present (its the first character of heures). So, you need to change the ordering. You should first search for heures, and if that is not present, then search for h. So, replacing (h|heures?) with (heures?|h) solves the problem.

Upvotes: 2

Related Questions