how to do certain changes with data having alternative ids?

Question

I am struggling with pivoting and reshaping some data. I have data as shown below.

nickname:Nick   Gavin
nickname:Nick   job: Teacher
nickname:Nick   duties: teaching_math
nickname:Bob    Marcus
nickname:Bob    job: Musician
nickname:Bob    duties: plays_piano

I want to change it to:

Nick Teacher teaching_math
Gavin Teacher teaching_math
Bob Musician plays_piano
Marcus Musician plays_piano

Any help would be highly appreciated!

sammywemmy · Accepted Answer

#get the names, remove the nickname appendage
df[0] = df[0].str.split(':').str[-1]

#create temp column to get nicknames into another column
df['temp'] = np.where(~df[1].str.contains('[:]'),df[0],np.nan)

#extract words after the ':'
df[1] = df[1].str.lstrip('job:').str.lstrip('duties:').str.strip()

#fillna to the side so each name has job and duties beneath
df = df.ffill(axis=1)

#group by col 0
#combine words 
#stack
#split into separate columns
#and drop index 0
final = (df
         .groupby(0)
         .agg(lambda x: x.str.cat(sep=','))
         .stack()
         .str.split(',', expand = True)
         .reset_index(drop=[0]))

final

    0          1           2
0   Marcus  Musician    plays_piano
1   Bob     Musician    plays_piano
2   Gavin   Teacher     teaching_math
3   Nick    Teacher     teaching_math

how to do certain changes with data having alternative ids?

Answers (2)

Related Questions