Reputation: 93
I have dataframe like below. I need to tranform them in the below format how that be done.
Input
Customer Service
A National_DOM_Express_S1,S2,S3
A National_DOM_ExpressPlus_S1,S2,S3
B National_DOM_ExpressSvr_S1,S2
B National_DOM_Std_S1,S3
C Export_TB_Express_S1,S2,S3
Expected Ouptut
Customer Type Market Product Service
A National DOM Express S1
A National DOM Express S2
A National DOM Express S3
A National DOM ExpressPlus S1
A National DOM ExpressPlus S2
A National DOM ExpressPlus S2
B National DOM ExpressSvr S1
B National DOM ExpressSvr S2
B National DOM Std S1
B National DOM Std S3
C Export TB Express S1
C Export TB Express S2
C Export TB Express S3
How can this can be in python?
Upvotes: 1
Views: 50
Reputation: 28644
#combine the two columns into one
res = (df.Customer.str.cat(df.Service,sep="_")
#split on combination of _ and ,
.str.split("[_,]",expand=True)
#set index on the first three columns
#these are the columns not containing S1, S2, ...
.set_index([0,1,2,3])
#stack and remove the last level as it is irrelevant
.stack()
.droplevel(-1)
)
#give the index names relevant to the final output
res.index = res.index.set_names(['Customer', 'Type', 'Market', 'Product'])
#reset index and give the final column a name as well
res.reset_index(name='Service')
Customer Type Market Product Service
0 A National DOM Express S1
1 A National DOM Express S2
2 A National DOM Express S3
3 A National DOM ExpressPlus S1
4 A National DOM ExpressPlus S2
5 A National DOM ExpressPlus S3
6 B National DOM ExpressSvr S1
7 B National DOM ExpressSvr S2
8 B National DOM Std S1
9 B National DOM Std S3
10 C Export TB Express S1
11 C Export TB Express S2
12 C Export TB Express S3
Upvotes: 1
Reputation: 862681
Use Series.str.split
by _
first for new columns and then split Service
with ,
for lists and use DataFrame.explode
, last for default index is used DataFrame.reset_index
with drop=True
:
df[['Type','Market','Product','Service']] = df['Service'].str.split('_', expand=True)
df = (df.assign(Service = df.pop('Service').str.split(','))
.explode('Service')
.reset_index(drop=True))
print (df)
Customer Type Market Product Service
0 A National DOM Express S1
1 A National DOM Express S2
2 A National DOM Express S3
3 A National DOM ExpressPlus S1
4 A National DOM ExpressPlus S2
5 A National DOM ExpressPlus S3
6 B National DOM ExpressSvr S1
7 B National DOM ExpressSvr S2
8 B National DOM Std S1
9 B National DOM Std S3
10 C Export TB Express S1
11 C Export TB Express S2
12 C Export TB Express S3
Upvotes: 2