Bartek Malysz
Bartek Malysz

Reputation: 1032

separate upper case chars with digits from lower case chars with digits

I have a column Name with data in format below:

  Name              Name2
0 MORR1223ldkeha12  ldkeha12
1 FRAN2771yetg4fq1  yetg4fq1
2 MORR56333gft4tsd1 gft4tsd1

I wanted to separate name as per column Name2. There is a pattern of 4 upper case chars, followed by 4-5 digits, and I'm interested in what follows these 4-5 digits.

Is there any way to achieve this?

Upvotes: 0

Views: 42

Answers (4)

Chabu
Chabu

Reputation: 141

If you change your re like this '(^[A-Z]{4})([0-9]{4,5})(.+)' you can access the different parts using the submatches of the match result.

So in Anil's code, group(0) will return the whole match, 1 the first group, 2 the second one and 3 the rest.

Upvotes: 0

Rakesh
Rakesh

Reputation: 82785

Using str.extract

import pandas as pd

df = pd.DataFrame({"Name": ['MORR1223ldkeha12', 'FRAN2771yetg4fq1', 'MORR56333gft4tsd1']})
df["Name2"] = df["Name"].str.extract(r"\d{4,5}(.*)")
print(df)

Output:

                Name     Name2
0   MORR1223ldkeha12  ldkeha12
1   FRAN2771yetg4fq1  yetg4fq1
2  MORR56333gft4tsd1  gft4tsd1

Upvotes: 1

Anil Kumar
Anil Kumar

Reputation: 102

You can try below logic:

import re
_names = ['MORR1223ldkeha12', 'FRAN2771yetg4fq1', 'MORR56333gft4tsd1']
result = []
for _name in _names:
    m = re.search('^[A-Z]{4}[0-9]{4,5}(.+)', _name)
    result.append(m.group(1))
print(result)

Upvotes: 2

Ahorn
Ahorn

Reputation: 649

You could use a regex to find out if there are 4 or 5 digits and then remove either the first 8 or 9 letters. So if the pattern ^[A-Z]{4}[0-9]{5}.* matches, there are 5 digits, else 4.

Upvotes: 0

Related Questions