Reputation: 198
I have a string:
s= "Classic for older systems. 😃💁 People • 🐻🌻 Animals • 🍔🍹 Food • 🎷⚽ Activities • 🚘🌇 Travel • 💡🎉 Objects • 💖🔣 Symbols ...45.6"
I want to remove symbols, emojis, •
Expected output is as follows:
"Classic for older systems People Animals Food Activities Travel Objects Symbols 45.6"
Code:
re.sub(r'([^\s\w]|_)+', '', s)
produces
'Classic for older systems People Animals Food Activities Travel Objects Symbols 456'
It is removing dot from floating point number. How can I fix this?
Upvotes: 0
Views: 51
Reputation: 43169
You could mimic (*SKIP)(*FAIL)
with:
import re
s = "Classic for older systems. 😃💁 People • 🐻🌻 Animals • 🍔🍹 Food • 🎷⚽ Activities • 🚘🌇 Travel • 💡🎉 Objects • 💖🔣 Symbols ...45.6"
rx = re.compile(r'\d+\.\d+|(\W+)')
def replacer(match):
if match.group(1) is not None:
return ' ' * len(match.group(1))
else:
return match.group(0)
s = rx.sub(replacer, s)
print(s)
This uses a function replacer
as replacement and yields
Classic for older systems People Animals Food Activities Travel Objects Symbols 45.6
Upvotes: 0
Reputation: 22817
(\d+\.\d+)|[^a-z\d\s]+
(\d+\.\d+)
Captures decimal numbers into the first capture group: One or more digits, dot, one or more digits[^a-z\d\s]+
Matches one or more of any characters that are not alphanumeric or whitespace. Using the i
(case-insensitive flag) this also matches uppercase variants.Replacement: $1
Outputs:
Classic for older systems People Animals Food Activities Travel Objects Symbols 45.6
Upvotes: 2