Zephyr
Zephyr

Reputation: 1352

Splitting the string with multiple unique spliter in pandas

One of the columns is string. I want to split the string but it does not have a unique character to use as a spliter. Below is the sample data frame:

`df = pd.DataFrame({'Name':['John','David'],'Occupation':['CEO','Dep Dir'],'Contact':['HP No-Mobile Ph 123:456','Off-Mobile Ph 152:256']},`)

What I wanted to do is to split Contact. My desired output will be as follow: My Desired output

I used the following code to split at '-'.

df[['Contact1','Contact2']] = df.Contact.str.split('[-]',expand=True)

But the output is not the format that I wanted. Can anyone help me with that it is a specific problem which I cannot find it. Thanks,

Zep

Upvotes: 0

Views: 49

Answers (3)

Space Impact
Space Impact

Reputation: 13255

First slice the unwanted data and then use split (Assuming the length of data Ph is constant):

df[['Contact1','Contact2']] = df.Contact.str[:-8].str.split('[-]',expand=True)

If data after Ph is not constant use extract on alphabets and space:

df[['Contact1','Contact2']] = df.Contact.str.split('[-]',expand=True)
df['Contact2'] = df.Contact2.str.extract('([a-zA-Z ]+)')[0].str.rstrip()

df = pd.DataFrame({'Name':['John','David'],
                   'Occupation':['CEO','Dep Dir'],
                   'Contact':['HP No-Mobile Ph 123:456','Off-Mobile Ph']},)

print(df)
    Name Occupation                  Contact
0   John        CEO  HP No-Mobile Ph 123:456
1  David    Dep Dir            Off-Mobile Ph

df[['Contact1','Contact2']] = df.Contact.str.split('[-]',expand=True)
print(df)

    Name Occupation                  Contact Contact1           Contact2
0   John        CEO  HP No-Mobile Ph 123:456    HP No  Mobile Ph 123:456
1  David    Dep Dir            Off-Mobile Ph      Off          Mobile Ph

df['Contact2'] = df.Contact2.str.extract('([a-zA-Z ]+)')[0].str.rstrip()
print(df)

    Name Occupation                  Contact Contact1   Contact2
0   John        CEO  HP No-Mobile Ph 123:456    HP No  Mobile Ph
1  David    Dep Dir            Off-Mobile Ph      Off  Mobile Ph

Upvotes: 1

jezrael
jezrael

Reputation: 863146

I believe you need split by - for 2 columns and then rsplit by last whitespace:

df[['Contact1','Contact2']] = df.Contact.str.split('-',expand=True)
df['Contact2'] = df['Contact2'].str.rsplit(n=1).str[0]
print (df)
    Name Occupation                  Contact Contact1   Contact2
0   John        CEO  HP No-Mobile Ph 123:456    HP No  Mobile Ph
1  David    Dep Dir    Off-Mobile Ph 152:256      Off  Mobile Ph

Upvotes: 1

Naga kiran
Naga kiran

Reputation: 4607

df[['Contact1','Contact2']] = df['Contact'].str.split('-' or ' ',expand=True)
df.Contact2 = df.Contact2.str.split(' ').str[:-1].apply(' '.join)

Out:

              Contact       Name    Occupation  Contact1    Contact2
0   HP No-Mobile Ph 123:456 John    CEO          HP No     Mobile Ph
1   Off-Mobile Ph 152:256   David   Dep Dir       Off      Mobile Ph

Upvotes: 1

Related Questions