Reputation: 1032
I have a column Name with data in format below:
Name Name2
0 MORR1223ldkeha12 ldkeha12
1 FRAN2771yetg4fq1 yetg4fq1
2 MORR56333gft4tsd1 gft4tsd1
I wanted to separate name as per column Name2. There is a pattern of 4 upper case chars, followed by 4-5 digits, and I'm interested in what follows these 4-5 digits.
Is there any way to achieve this?
Upvotes: 0
Views: 42
Reputation: 141
If you change your re like this '(^[A-Z]{4})([0-9]{4,5})(.+)' you can access the different parts using the submatches of the match result.
So in Anil's code, group(0) will return the whole match, 1 the first group, 2 the second one and 3 the rest.
Upvotes: 0
Reputation: 82785
Using str.extract
import pandas as pd
df = pd.DataFrame({"Name": ['MORR1223ldkeha12', 'FRAN2771yetg4fq1', 'MORR56333gft4tsd1']})
df["Name2"] = df["Name"].str.extract(r"\d{4,5}(.*)")
print(df)
Output:
Name Name2
0 MORR1223ldkeha12 ldkeha12
1 FRAN2771yetg4fq1 yetg4fq1
2 MORR56333gft4tsd1 gft4tsd1
Upvotes: 1
Reputation: 102
You can try below logic:
import re
_names = ['MORR1223ldkeha12', 'FRAN2771yetg4fq1', 'MORR56333gft4tsd1']
result = []
for _name in _names:
m = re.search('^[A-Z]{4}[0-9]{4,5}(.+)', _name)
result.append(m.group(1))
print(result)
Upvotes: 2
Reputation: 649
You could use a regex to find out if there are 4 or 5 digits and then remove either the first 8 or 9 letters. So if the pattern ^[A-Z]{4}[0-9]{5}.*
matches, there are 5 digits, else 4.
Upvotes: 0