Morpheus
Morpheus

Reputation: 3523

Python Regex remove numbers and numbers with punctaution

I have the following string

 line = "1234567 7852853427.111 https://en.wikipedia.org/wiki/Dictionary_(disambiguation)"

I would like to remove the numbers 1234567 7852853427.111 using regular expresisions

I have this re

nline = re.sub("^\d+\s|\s\d+\s|\s\d\w\d|\s\d+$", " ", line)

but it is not doing what i hoped it would be doing.

Can anyone point me in the right direction?

Upvotes: 5

Views: 9917

Answers (4)

mbomb007
mbomb007

Reputation: 4231

Though you are asking for a regular expression, a better solution would be to use str.split, assuming that your string will always be in the format {number} {number} {hyperlink}.

As @godaygo said, you can use this:

line = line.split()[-1]

The string will be split on whitespace, and we select the last substring.

If you want to access all parts (assuming there's always three), you can use this instead:

num1, num2, url = line.split()

Upvotes: 0

anubhava
anubhava

Reputation: 784998

You can use:

>>> line = "1234567 7852853427.111 https://en.wikipedia.org/wiki/Dictionary_(disambiguation)" 
>>> print re.sub(r'\b\d+(?:\.\d+)?\s+', '', line)

https://en.wikipedia.org/wiki/Dictionary_(disambiguation)

Regex \b\d+(?:\.\d+)?\s+ will match an integer or decimal number followed by 1 or more spaces. \b is for word boundary.

Upvotes: 6

B. Farkas
B. Farkas

Reputation: 1

I think this is what you want:

nline = re.sub("\d+\s\d+\.\d+", "", line)

It removes the numbers from line. If you want to keep the space in front of "http..." your second parameter should of course be " ".

If you also want to record the individual number strings you could put them in groups like this:

>>> result = re.search("(\d+)\s(\d+\.\d+)", line)
>>> print(result.group(0))
1234567 7852853427.111
>>> print(result.group(1))
1234567
>>> print(result.group(2))
7852853427.111

A great way to learn and practice regular expressions is regex101.

Upvotes: 0

Moses Koledoye
Moses Koledoye

Reputation: 78546

Here's a non-regex approach, if your regex requirement is not entirely strict, using itertools.dropwhile:

>>> ''.join(dropwhile(lambda x: not x.isalpha(), line))
'https://en.wikipedia.org/wiki/Dictionary_(disambiguation)'

Upvotes: 2

Related Questions