Gedanggoreng
Gedanggoreng

Reputation: 321

Python - How to Remove Words That Started With Number and Contain Period

What is the best way to remove words in a string that start with numbers and contain periods in Python?

this_string = 'lorum3 ipsum 15.2.3.9.7 bar foo 1. v more text 46 2. here and even more text here v7.8.989'

If I use Regex:

re.sub('[0-9]*\.\w*', '', this_string)

The result will be:

'lorum3 ipsum  bar foo  v more text 46  here and even more text here v'

I'm expecting the word v7.8.989 not to be removed, since it's started with a letter.

It will be great if the removed words aren't adding the unneeded space. My Regex code above still adds space.

Upvotes: 2

Views: 553

Answers (4)

Nick
Nick

Reputation: 147256

You can use this regex to match the strings you want to remove:

(?:^|\s)[0-9]+\.[0-9.]*(?=\s|$)

It matches:

  • (?:^|\s) : beginning of string or whitespace
  • [0-9]+ : at least one digit
  • \. : a period
  • [0-9.]* : some number of digits and periods
  • (?=\s|$) : a lookahead to assert end of string or whitespace

Regex demo

You can then replace any matches with the empty string. In python

this_string = 'lorum3 ipsum 15.2.3.9.7 bar foo 1. v more text 46 2. here and even more text here v7.8.989 and also 1.2.3c as well'
result = re.sub(r'(?:^|\s)[0-9]+\.[0-9.]*(?=\s|$)', '', this_string)

Output:

lorum3 ipsum bar foo v more text 46 here and even more text here v7.8.989 and also 1.2.3c as well

Upvotes: 4

The fourth bird
The fourth bird

Reputation: 163632

If you can make use of a lookbehind, you can match the numbers and replace with an empty string:

(?<!\S)\d+\.[\d.]*(?!\S)

Explanation

  • (?<!\S) Assert a whitespace boundary to the left
  • \d+\.[\d.]* Match 1+ digits, then a dot followed by optional digits or dots
  • (?!\S) Assert a whitespace boundary to the right

Regex demo

If you want to match an optional leading whitespace char:

\s?(?<!\S)\d+\.[\d.]*(?!\S)

Regex demo

Upvotes: 2

M..
M..

Reputation: 26

You can try this regex:

(^|\s)\d[^\s]*\.+[^\s]*

This matches strings like '7.a.0.1' which contains letter extra.

Here is a demo.

Upvotes: 1

mahesh
mahesh

Reputation: 1098

If you don't want to use regex, you can also do it using simple string operations:

res = ''.join(['' if (e.startswith(('0','1','2','3','4','5','6','7','8','9')) and '.' in e) else e+' ' for e in this_string.split()])

Upvotes: 1

Related Questions