user12691361
user12691361

Reputation:

how to split a string with pattern "alphabet/alphabet" and not split "number/number" in same string

I am trying to separate first name from second name, based on a pattern. But i do N0T want to separate if that pattern occurs in numbers.

Input:

name
john 6/1
park/avenue 34/45
eela 21/22
shaun 21/22
shaun/paul 77/78

code:


import pandas as pd
import re


import pandas as pd
import re

df1=pd.read_csv('bg.txt',sep='\t')
df1['split?']=df1['name1'].apply(lambda a: 'yes' if  (re.search('[^\d+\/d+]',a) and re.search('[\u0061-\u007A]',a))  else 'no')
df1['name_2'] = df1[df1['split?']=='yes']['name1'].apply (lambda b: b.split('/')[1])
print(df1)

Expected Output:

name1                 split?    name2
john 6/1              no        null
park/avenue 34/45     yes       avenue
eela 21/22            no        null
shaun 21/22           no        null
shaun/paul 77/78      yes       paul
mark/tyson            yes       tyson

Upvotes: 0

Views: 104

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627103

You may use a pattern like [^\W\d_]+/([^\W\d_]+) that matches 1+ Unicode letters, then /, and then captures 1+ Unicode letters in Group 1. Probably, use it with word boundaries to only match as whole words:

df['name2'] = df['name'].str.extract(r'\b[^\W\d_]+/([^\W\d_]+)\b', expand=False)
df['split?'] = df['name2'].notna().map({False:'no', True:'yes'})

To use null instead of NaN you may add a df['name2'] = df['name2'].fillna('null') line.

Python demo:

import pandas as pd

cols = {'name':['john 6/1','park/avenue 34/45','eela 21/22','shaun 21/22','shaun/paul 77/78','mark/tyson']}
df = pd.DataFrame(cols)
df['name2'] = df['name'].str.extract(r'[^\W\d_]+/([^\W\d_]+)', expand=False)
df['split?'] = df['name2'].notna().map({False:'no', True:'yes'})

Output:

>>> df
                name   name2 split?
0           john 6/1     NaN     no
1  park/avenue 34/45  avenue    yes
2         eela 21/22     NaN     no
3        shaun 21/22     NaN     no
4   shaun/paul 77/78    paul    yes
5         mark/tyson   tyson    yes

Upvotes: 0

yatu
yatu

Reputation: 88276

You can use str.extract with the following pattern:

df['name2'] = df.name.str.extract(r'/(\w+)\s\d+/')
df['split'] = df.name2.notna().map({False:'No', True:'Yes'})

print(df)

                name   name2 split
0           john 6/1     NaN    No
1  park/avenue 34/45  avenue   Yes
2         eela 21/22     NaN    No
3        shaun 21/22     NaN    No
4   shaun/paul 77/78    paul   Yes

Upvotes: 1

Rakesh
Rakesh

Reputation: 82785

Using str.extract

Ex:

df = pd.DataFrame({"Col": ['john 6/1', 'park/avenue 34/45', 'eela 21/22', 'shaun 21/22', 'shaun/paul 77/78']})
df['New'] = df['Col'].str.extract(r"\/([A-Za-z]+)")
print(df)

Output:

                 Col     New
0           john 6/1     NaN
1  park/avenue 34/45  avenue
2         eela 21/22     NaN
3        shaun 21/22     NaN
4   shaun/paul 77/78    paul

Upvotes: 0

Related Questions