raheel
raheel

Reputation: 164

split string in series based on another column value on the same index

Hello I have pandas dataframe

    code                                 LocationCod
0.  999010001-R-XQK33-4658-999-          R-XQK33
1.  602444201-N-FKD2-000257999-999-      N-FKD2
2.  602016601-N-TAS-777-999-             N-TAS
3.  602010001-345-999-999-               345

I want to split the value in column Code based on Value in Column LocationCod which should be expand to new columns CategCode and MainCode.required table would looklike this.

    code                                 LocationCod     CategCode           MainCode
0.  999010001-R-XQK33-4658-999-          R-XQK33         4658-999-           999010001
1.  602444201-N-FKD2-000257999-999-      N-FKD2          000257999-999-      602444201
2.  602016601-N-TAS-777-999-             N-TAS           777-999-            602016601
3.  602010001-345-999-999-               345             999-999-            602010001

Please help.!

Upvotes: 1

Views: 562

Answers (3)

Renaud
Renaud

Reputation: 2819

you can try with:

df = pd.DataFrame({
'Code': ['999010001-R-XQK33-4658-999-', '602444201-N-FKD2-000257999-999-','602016601-N-TAS-777-999-','602010001-345-999-999- '],
'LocationCod': ['R-XQK33','N-FKD2','N-TAS',345]})



df['MainCode']=df['Code'].apply(lambda x:x.split('-')[0]+'-')
df['CategCode']=df['Code'].apply(lambda x:'-'.join(x.split('-')[-3:]))
print(df)

result:

                              Code LocationCod    MainCode       CategCode
0      999010001-R-XQK33-4658-999-     R-XQK33  999010001-       4658-999-
1  602444201-N-FKD2-000257999-999-      N-FKD2  602444201-  000257999-999-
2         602016601-N-TAS-777-999-       N-TAS  602016601-        777-999-
3          602010001-345-999-999-          345  602010001-       999-999-

Upvotes: 0

Mayank Porwal
Mayank Porwal

Reputation: 34056

You can use df.apply with a custom lambda:

In [2270]: df[['MainCode', 'CategCode']] = pd.DataFrame(df.apply(lambda x: x['code'].split(x['LocationCod']), 1).tolist())

In [2271]: df
Out[2271]: 
                              code LocationCod    MainCode        CategCode
0      999010001-R-XQK33-4658-999-     R-XQK33  999010001-       -4658-999-
1  602444201-N-FKD2-000257999-999-      N-FKD2  602444201-  -000257999-999-
2         602016601-N-TAS-777-999-       N-TAS  602016601-        -777-999-
3           602010001-345-999-999-         345  602010001-        -999-999-

You can remove the - from new columns to match your expected output like this:

In [2281]: df.MainCode = df.MainCode.str[:-1]
In [2280]: df.CategCode = df.CategCode.str[1:]

In [2282]: df
Out[2282]: 
                              code LocationCod   MainCode       CategCode
0      999010001-R-XQK33-4658-999-     R-XQK33  999010001       4658-999-
1  602444201-N-FKD2-000257999-999-      N-FKD2  602444201  000257999-999-
2         602016601-N-TAS-777-999-       N-TAS  602016601        777-999-
3           602010001-345-999-999-         345  602010001        999-999-

Upvotes: 1

Nickj
Nickj

Reputation: 1012

Try this:

df['CategCode']=[df.code.values[i].split(df.LocationCod.values[i])[-1][1:] for i in range(len(df))]
df['MainCode']=[df.code.values[i].split(df.LocationCod.values[i])[0][:-1] for i in range(len(df))]

Upvotes: 0

Related Questions