Ria Alves
Ria Alves

Reputation: 93

How to split sub elements of a columns into seperate colums and row elements?

I have dataframe like below. I need to tranform them in the below format how that be done.

Input

Customer   Service
A          National_DOM_Express_S1,S2,S3
A          National_DOM_ExpressPlus_S1,S2,S3
B          National_DOM_ExpressSvr_S1,S2
B          National_DOM_Std_S1,S3
C          Export_TB_Express_S1,S2,S3

Expected Ouptut

Customer   Type      Market  Product   Service
A          National   DOM    Express       S1
A          National   DOM    Express       S2
A          National   DOM    Express       S3
A          National   DOM    ExpressPlus   S1
A          National   DOM    ExpressPlus   S2
A          National   DOM    ExpressPlus   S2
B          National   DOM    ExpressSvr    S1
B          National   DOM    ExpressSvr    S2
B          National   DOM    Std           S1
B          National   DOM    Std           S3
C          Export     TB     Express       S1
C          Export     TB     Express       S2
C          Export     TB     Express       S3

How can this can be in python?

Upvotes: 1

Views: 50

Answers (2)

sammywemmy
sammywemmy

Reputation: 28644

       #combine the two columns into one
res = (df.Customer.str.cat(df.Service,sep="_")
        #split on combination of _ and ,
       .str.split("[_,]",expand=True)
        #set index on the first three columns
        #these are the columns not containing S1, S2, ...
       .set_index([0,1,2,3])
        #stack and remove the last level as it is irrelevant
       .stack()
       .droplevel(-1)
      )
#give the index names relevant to the final output
res.index = res.index.set_names(['Customer', 'Type', 'Market', 'Product'])
#reset index and give the final column a name as well
res.reset_index(name='Service')

    Customer    Type    Market  Product     Service
0     A        National DOM     Express     S1
1     A        National DOM     Express     S2
2     A        National DOM     Express     S3
3     A        National DOM     ExpressPlus S1
4     A        National DOM     ExpressPlus S2
5     A        National DOM     ExpressPlus S3
6     B        National DOM     ExpressSvr  S1
7     B        National DOM     ExpressSvr  S2
8     B        National DOM     Std         S1
9     B        National DOM     Std         S3
10    C        Export   TB      Express     S1
11    C        Export   TB      Express     S2
12    C        Export   TB      Express     S3

Upvotes: 1

jezrael
jezrael

Reputation: 862681

Use Series.str.split by _ first for new columns and then split Service with , for lists and use DataFrame.explode, last for default index is used DataFrame.reset_index with drop=True:

df[['Type','Market','Product','Service']] = df['Service'].str.split('_', expand=True)
df = (df.assign(Service = df.pop('Service').str.split(','))
        .explode('Service')
        .reset_index(drop=True))
print (df)
   Customer      Type Market      Product Service
0         A  National    DOM      Express      S1
1         A  National    DOM      Express      S2
2         A  National    DOM      Express      S3
3         A  National    DOM  ExpressPlus      S1
4         A  National    DOM  ExpressPlus      S2
5         A  National    DOM  ExpressPlus      S3
6         B  National    DOM   ExpressSvr      S1
7         B  National    DOM   ExpressSvr      S2
8         B  National    DOM          Std      S1
9         B  National    DOM          Std      S3
10        C    Export     TB      Express      S1
11        C    Export     TB      Express      S2
12        C    Export     TB      Express      S3

Upvotes: 2

Related Questions