DreamerP
DreamerP

Reputation: 198

Extracting alphanumeric, integers, floating numbers from string using Python3.6

I have a string:

s= "Classic for older systems. 😃💁 People • 🐻🌻 Animals • 🍔🍹 Food • 🎷⚽ Activities • 🚘🌇 Travel • 💡🎉 Objects • 💖🔣 Symbols ...45.6"

I want to remove symbols, emojis, •

Expected output is as follows:

"Classic for older systems  People   Animals   Food   Activities   Travel   Objects   Symbols 45.6"

Code:

re.sub(r'([^\s\w]|_)+', '', s)

produces

'Classic for older systems  People   Animals   Food   Activities   Travel   Objects   Symbols 456'

It is removing dot from floating point number. How can I fix this?

Upvotes: 0

Views: 51

Answers (2)

Jan
Jan

Reputation: 43169

You could mimic (*SKIP)(*FAIL) with:

import re

s = "Classic for older systems. 😃💁 People • 🐻🌻 Animals • 🍔🍹 Food • 🎷⚽ Activities • 🚘🌇 Travel • 💡🎉 Objects • 💖🔣 Symbols ...45.6"

rx = re.compile(r'\d+\.\d+|(\W+)')

def replacer(match):
    if match.group(1) is not None:
        return ' ' * len(match.group(1))
    else:
        return match.group(0)

s = rx.sub(replacer, s)
print(s)

This uses a function replacer as replacement and yields

Classic for older systems     People      Animals      Food      Activities      Travel      Objects      Symbols    45.6

Upvotes: 0

ctwheels
ctwheels

Reputation: 22817

See regex in use here

(\d+\.\d+)|[^a-z\d\s]+
  • (\d+\.\d+) Captures decimal numbers into the first capture group: One or more digits, dot, one or more digits
  • [^a-z\d\s]+ Matches one or more of any characters that are not alphanumeric or whitespace. Using the i (case-insensitive flag) this also matches uppercase variants.

Replacement: $1

Outputs:

Classic for older systems  People   Animals   Food   Activities   Travel   Objects   Symbols 45.6

Upvotes: 2

Related Questions