Reputation: 3523
I have the following string
line = "1234567 7852853427.111 https://en.wikipedia.org/wiki/Dictionary_(disambiguation)"
I would like to remove the numbers 1234567 7852853427.111 using regular expresisions
I have this re
nline = re.sub("^\d+\s|\s\d+\s|\s\d\w\d|\s\d+$", " ", line)
but it is not doing what i hoped it would be doing.
Can anyone point me in the right direction?
Upvotes: 5
Views: 9917
Reputation: 4231
Though you are asking for a regular expression, a better solution would be to use str.split
, assuming that your string will always be in the format {number} {number} {hyperlink}
.
As @godaygo said, you can use this:
line = line.split()[-1]
The string will be split on whitespace, and we select the last substring.
If you want to access all parts (assuming there's always three), you can use this instead:
num1, num2, url = line.split()
Upvotes: 0
Reputation: 784998
You can use:
>>> line = "1234567 7852853427.111 https://en.wikipedia.org/wiki/Dictionary_(disambiguation)"
>>> print re.sub(r'\b\d+(?:\.\d+)?\s+', '', line)
https://en.wikipedia.org/wiki/Dictionary_(disambiguation)
Regex \b\d+(?:\.\d+)?\s+
will match an integer or decimal number followed by 1 or more spaces. \b
is for word boundary.
Upvotes: 6
Reputation: 1
I think this is what you want:
nline = re.sub("\d+\s\d+\.\d+", "", line)
It removes the numbers from line. If you want to keep the space in front of "http..." your second parameter should of course be " ".
If you also want to record the individual number strings you could put them in groups like this:
>>> result = re.search("(\d+)\s(\d+\.\d+)", line)
>>> print(result.group(0))
1234567 7852853427.111
>>> print(result.group(1))
1234567
>>> print(result.group(2))
7852853427.111
A great way to learn and practice regular expressions is regex101.
Upvotes: 0
Reputation: 78546
Here's a non-regex approach, if your regex requirement is not entirely strict, using itertools.dropwhile
:
>>> ''.join(dropwhile(lambda x: not x.isalpha(), line))
'https://en.wikipedia.org/wiki/Dictionary_(disambiguation)'
Upvotes: 2