Reputation: 77
Here is my data:
Identification Req
RCFD1797 Violet
BHCKK085 Green
RCFD1797 Green
BHCKK085 Orange
RCFD1797 Blue
BHCKK085 Yellow
BHCKK085 Red
WRSS1797 Green
WRSS1797 Violet
WRSS1797 Blue
RCON1797 Violet
RCON1797 Green
RCON1797 Blue
RCON1797 Indigo
BHDM1797 Violet
BHDM1797 Green
BHDM1797 Blue
BHDM1797 Indigo
The first column is filled with duplicate ID numbers. So for example, "RCFD1797" shows up three times with one requirement on each row. This is what I need it to look like:
Identification Req_1 Req_2 Req_3 Req_4
RCFD1797 Violet Green Blue
BHCKK085 Green Orange Yellow Red
WRSS1797 Green Violet Blue
RCON1797 Violet Green Blue Indigo
BHDM1797 Violet Green Blue Indigo
I have no problem importing my Excel file using Pandas, but I have no idea how to define my dataframe to yield the second table above. Any ideas?
Thanks!
Upvotes: 2
Views: 935
Reputation: 294516
With set_index
df.set_index([
'Identification',
df.groupby('Identification').cumcount().add(1).astype(str).radd('Req_')
]).Req.unstack(fill_value='')
Req_1 Req_2 Req_3 Req_4
Identification
BHCKK085 Green Orange Yellow Red
BHDM1797 Violet Green Blue Indigo
RCFD1797 Violet Green Blue
RCON1797 Violet Green Blue Indigo
WRSS1797 Green Violet Blue
Upvotes: 1
Reputation: 1166
Try this:
test
#Pasting only the partial table here
Identification Req
0 RCFD1797 Violet
1 BHCKK085 Green
2 RCFD1797 Green
3 BHCKK085 Orange
4 RCFD1797 Blue
5 BHCKK085 Yellow
6 BHCKK085 Red
.. ... ...
my_df = test.groupby('Identification')['Req'].apply(list).apply(pd.Series)
my_df.columns = ['Req'+str(i) for i in my_df.columns]
my_df
Req0 Req1 Req2 Req3
Identification
BHCKK085 Green Orange Yellow Red
BHDM1797 Violet Green Blue Indigo
RCFD1797 Violet Green Blue NaN
RCON1797 Violet Green Blue Indigo
WRSS1797 Green Violet Blue NaN
Hope this helps.
Upvotes: 4