Simang Champramary
Simang Champramary

Reputation: 21

how to do certain changes with data having alternative ids?

I am struggling with pivoting and reshaping some data. I have data as shown below.

nickname:Nick   Gavin
nickname:Nick   job: Teacher
nickname:Nick   duties: teaching_math
nickname:Bob    Marcus
nickname:Bob    job: Musician
nickname:Bob    duties: plays_piano

I want to change it to:

Nick Teacher teaching_math
Gavin Teacher teaching_math
Bob Musician plays_piano
Marcus Musician plays_piano

Any help would be highly appreciated!

Upvotes: 0

Views: 45

Answers (2)

San k
San k

Reputation: 141

Try the below code.

dicts = {}
for i in open('your_data.txt'):
    split_i = i.split('   ')
    if split_i[0].split(':')[1] not in dicts:
        dicts[split_i[0].split(':')[1]] = [split_i[1].rstrip()]
    else:
        dicts[split_i[0].split(':')[1]].append(split_i[1].replace('job: ', '').replace('duties:', '').strip())
for k, v in dicts.iteritems():
    print k, v

Upvotes: 0

sammywemmy
sammywemmy

Reputation: 28709

#get the names, remove the nickname appendage
df[0] = df[0].str.split(':').str[-1]

#create temp column to get nicknames into another column
df['temp'] = np.where(~df[1].str.contains('[:]'),df[0],np.nan)

#extract words after the ':'
df[1] = df[1].str.lstrip('job:').str.lstrip('duties:').str.strip()

#fillna to the side so each name has job and duties beneath
df = df.ffill(axis=1)

#group by col 0
#combine words 
#stack
#split into separate columns
#and drop index 0
final = (df
         .groupby(0)
         .agg(lambda x: x.str.cat(sep=','))
         .stack()
         .str.split(',', expand = True)
         .reset_index(drop=[0]))

final

    0          1           2
0   Marcus  Musician    plays_piano
1   Bob     Musician    plays_piano
2   Gavin   Teacher     teaching_math
3   Nick    Teacher     teaching_math

Upvotes: 1

Related Questions