Reputation: 163
I have a list of strings, some of them are containing numbers.
for instance:
I want to clean up all numbers in those strings, but keep specific numbers, such as 32 and 64, so the clean up will return this:
Note, that in the first example (def3464) the number 64 exists, but not alone, therefore it should be eliminated.
Any ideas?
Upvotes: 1
Views: 1098
Reputation: 30991
You can do the task even without lambdas, relying solely on the regex capabilities (although the regex is more complicated).
The regex needed is: (?:(32|64)|\d+)(?=\D|$)
. Details:
(?:
- Start of the non-capturing group, needed as a container
for alternatives.(32|64)
- The first alternative (and capturing group), either
32
or 64
.|
- Or.\d+
- The second alternative, a sequence of digits.)
- End of the non-capturing group.(?=\D|$)
- The (common) ending part (after both alternatives) - positive
lookup for either a non-digit char or end of string.The first alternative (and capturing group) matches either 32
or 64
and the second alternative (without capturing group) matches any other number.
The replacement expression is \1
(replace the match with the content of
the first capturing group).
So, if the second alternative matched, the first group matched nothing, hence nothing is put as the replacement for the current match.
To demonstrate how it works, run the example program:
import re
src = ['abc123 def3464', 'hello32 goodbye64', 'some numbers 1254324']
print(src)
result = [re.sub(r"(?:(32|64)|\d+)(?=\D|$)", r"\1", i) for i in src]
print(result)
If you are unhappy with the trailing space in the last output string,
add .strip()
after re.sub(...)
.
Upvotes: 3
Reputation: 71461
You can use re.sub
:
import re
s = ['abc123 def3464', 'hello32 goodbye64', 'some numbers 1254324']
new_s = [re.sub('\d+', lambda x:['', x.group()][x.group() in ['32', '64']], i) for i in s]
Output:
['abc def', 'hello32 goodbye64', 'some numbers ']
Upvotes: 3