Pratik V
Pratik V

Reputation: 70

Pandas regex to remove digits before consecutive dots

I have a string Node57Name123.grpObject12.grp23Symbol43.shape52.anotherobject25.shape23.

  1. Removing all the numbers that are before the dot and after the word.
  2. Ignoring the first part of the string i.e. "Node57Name123".
  3. Should not remove the digits if they are inside words.

Tried re.sub(r"\d+","",string) but it removed every other digit.

The output should look like this "Node57Name123.grpObject.grp23Symbol.shape.anotherobject.shape"

Can you please point me to the right direction.

Upvotes: 1

Views: 710

Answers (2)

JvdV
JvdV

Reputation: 75840

Just to give you a non-regex alternative' using rstrip(). We can feed this function a bunch of characters to remove from the right of the string e.g.: rstrip('0123456789'). Alternatively we can also use the digits constant from the string module:

from string import digits
s = 'Node57Name123.grpObject12.grp23Symbol43.shape52.anotherobject25.shape23'
x = '.'.join([s.split('.')[0]] + [i.rstrip(digits) for i in s.split('.')[1:]])
print(x)

Prints:

Node57Name123.grpObject.grp23Symbol.shape.anotherobject.shape

EDIT:

If you must use a regular pattern, it seems that the following covers your sample:

(\.[^.]*?)\d+\b

Replace with the 1st capture group, see the online demo

  • ( - Open capture group:
    • \.[^.]*? - A literal dot followed by 0+ non-dot characters (lazy).
    • ) - Close capture group.
  • \d+\b - Match 1+ digits up to a word-boundary.

A sample:

import re
s = 'Node57Name123.grpObject12.grp23Symbol43.shape52.anotherobject25.shape23'
x = re.sub(r'(\.[^.]*?)\d+\b', r'\1', s)
print(x)

Prints:

Node57Name123.grpObject.grp23Symbol.shape.anotherobject.shape

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You can use

re.sub(r'^([^.]*\.)|\d+(?![^.])', r'\1', text)

See the regex demo.

Details:

  • ^([^.]*\.) - zero or more chars other than a dot and then a . char at the start of the string captured into Group 1 (referred to with \1 from the replacement pattern)
  • | - or
  • \d+(?![^.]) - one or more digits followed with a dot or end of string (=(?=\.|$)).

See the Python demo:

import re
text = r'Node57Name123.grpObject12.grp23Symbol43.shape52.anotherobject25.shape23'
print( re.sub(r'^([^.]*\.)|\d+(?![^.])', r'\1', text) )
## => Node57Name123.grpObject.grp23Symbol.shape.anotherobject.shape

Upvotes: 2

Related Questions