Reputation: 21
I am struggling with pivoting and reshaping some data. I have data as shown below.
nickname:Nick Gavin nickname:Nick job: Teacher nickname:Nick duties: teaching_math nickname:Bob Marcus nickname:Bob job: Musician nickname:Bob duties: plays_piano
I want to change it to:
Nick Teacher teaching_math Gavin Teacher teaching_math Bob Musician plays_piano Marcus Musician plays_piano
Any help would be highly appreciated!
Upvotes: 0
Views: 45
Reputation: 141
Try the below code.
dicts = {}
for i in open('your_data.txt'):
split_i = i.split(' ')
if split_i[0].split(':')[1] not in dicts:
dicts[split_i[0].split(':')[1]] = [split_i[1].rstrip()]
else:
dicts[split_i[0].split(':')[1]].append(split_i[1].replace('job: ', '').replace('duties:', '').strip())
for k, v in dicts.iteritems():
print k, v
Upvotes: 0
Reputation: 28709
#get the names, remove the nickname appendage
df[0] = df[0].str.split(':').str[-1]
#create temp column to get nicknames into another column
df['temp'] = np.where(~df[1].str.contains('[:]'),df[0],np.nan)
#extract words after the ':'
df[1] = df[1].str.lstrip('job:').str.lstrip('duties:').str.strip()
#fillna to the side so each name has job and duties beneath
df = df.ffill(axis=1)
#group by col 0
#combine words
#stack
#split into separate columns
#and drop index 0
final = (df
.groupby(0)
.agg(lambda x: x.str.cat(sep=','))
.stack()
.str.split(',', expand = True)
.reset_index(drop=[0]))
final
0 1 2
0 Marcus Musician plays_piano
1 Bob Musician plays_piano
2 Gavin Teacher teaching_math
3 Nick Teacher teaching_math
Upvotes: 1