Reputation:
I am trying to separate first name from second name, based on a pattern. But i do N0T want to separate if that pattern occurs in numbers.
Input:
name
john 6/1
park/avenue 34/45
eela 21/22
shaun 21/22
shaun/paul 77/78
code:
import pandas as pd
import re
import pandas as pd
import re
df1=pd.read_csv('bg.txt',sep='\t')
df1['split?']=df1['name1'].apply(lambda a: 'yes' if (re.search('[^\d+\/d+]',a) and re.search('[\u0061-\u007A]',a)) else 'no')
df1['name_2'] = df1[df1['split?']=='yes']['name1'].apply (lambda b: b.split('/')[1])
print(df1)
Expected Output:
name1 split? name2
john 6/1 no null
park/avenue 34/45 yes avenue
eela 21/22 no null
shaun 21/22 no null
shaun/paul 77/78 yes paul
mark/tyson yes tyson
Upvotes: 0
Views: 104
Reputation: 627103
You may use a pattern like [^\W\d_]+/([^\W\d_]+)
that matches 1+ Unicode letters, then /
, and then captures 1+ Unicode letters in Group 1. Probably, use it with word boundaries to only match as whole words:
df['name2'] = df['name'].str.extract(r'\b[^\W\d_]+/([^\W\d_]+)\b', expand=False)
df['split?'] = df['name2'].notna().map({False:'no', True:'yes'})
To use null
instead of NaN
you may add a df['name2'] = df['name2'].fillna('null')
line.
Python demo:
import pandas as pd
cols = {'name':['john 6/1','park/avenue 34/45','eela 21/22','shaun 21/22','shaun/paul 77/78','mark/tyson']}
df = pd.DataFrame(cols)
df['name2'] = df['name'].str.extract(r'[^\W\d_]+/([^\W\d_]+)', expand=False)
df['split?'] = df['name2'].notna().map({False:'no', True:'yes'})
Output:
>>> df
name name2 split?
0 john 6/1 NaN no
1 park/avenue 34/45 avenue yes
2 eela 21/22 NaN no
3 shaun 21/22 NaN no
4 shaun/paul 77/78 paul yes
5 mark/tyson tyson yes
Upvotes: 0
Reputation: 88276
You can use str.extract
with the following pattern:
df['name2'] = df.name.str.extract(r'/(\w+)\s\d+/')
df['split'] = df.name2.notna().map({False:'No', True:'Yes'})
print(df)
name name2 split
0 john 6/1 NaN No
1 park/avenue 34/45 avenue Yes
2 eela 21/22 NaN No
3 shaun 21/22 NaN No
4 shaun/paul 77/78 paul Yes
Upvotes: 1
Reputation: 82785
Using str.extract
Ex:
df = pd.DataFrame({"Col": ['john 6/1', 'park/avenue 34/45', 'eela 21/22', 'shaun 21/22', 'shaun/paul 77/78']})
df['New'] = df['Col'].str.extract(r"\/([A-Za-z]+)")
print(df)
Output:
Col New
0 john 6/1 NaN
1 park/avenue 34/45 avenue
2 eela 21/22 NaN
3 shaun 21/22 NaN
4 shaun/paul 77/78 paul
Upvotes: 0