user2744315
user2744315

Reputation: 77

Move a column with duplicate values into a single row Pandas

Here is my data:

Identification  Req
    RCFD1797    Violet
    BHCKK085    Green
    RCFD1797    Green
    BHCKK085    Orange
    RCFD1797    Blue
    BHCKK085    Yellow
    BHCKK085    Red
    WRSS1797    Green
    WRSS1797    Violet
    WRSS1797    Blue
    RCON1797    Violet
    RCON1797    Green
    RCON1797    Blue
    RCON1797    Indigo
    BHDM1797    Violet
    BHDM1797    Green
    BHDM1797    Blue
    BHDM1797    Indigo

The first column is filled with duplicate ID numbers. So for example, "RCFD1797" shows up three times with one requirement on each row. This is what I need it to look like:

Identification  Req_1   Req_2   Req_3   Req_4
    RCFD1797    Violet  Green   Blue    
    BHCKK085    Green   Orange  Yellow  Red
    WRSS1797    Green   Violet  Blue    
    RCON1797    Violet  Green   Blue    Indigo
    BHDM1797    Violet  Green   Blue    Indigo

I have no problem importing my Excel file using Pandas, but I have no idea how to define my dataframe to yield the second table above. Any ideas?

Thanks!

Upvotes: 2

Views: 935

Answers (2)

piRSquared
piRSquared

Reputation: 294516

With set_index

df.set_index([
    'Identification',
    df.groupby('Identification').cumcount().add(1).astype(str).radd('Req_')
]).Req.unstack(fill_value='')

                 Req_1   Req_2   Req_3   Req_4
Identification                                
BHCKK085         Green  Orange  Yellow     Red
BHDM1797        Violet   Green    Blue  Indigo
RCFD1797        Violet   Green    Blue        
RCON1797        Violet   Green    Blue  Indigo
WRSS1797         Green  Violet    Blue        

Upvotes: 1

Sagar Dawda
Sagar Dawda

Reputation: 1166

Try this:

test
#Pasting only the partial table here

    Identification  Req
0   RCFD1797        Violet
1   BHCKK085        Green
2   RCFD1797        Green
3   BHCKK085        Orange
4   RCFD1797        Blue
5   BHCKK085        Yellow
6   BHCKK085        Red
..  ...             ...

my_df = test.groupby('Identification')['Req'].apply(list).apply(pd.Series)
my_df.columns = ['Req'+str(i) for i in my_df.columns]
my_df

                Req0    Req1    Req2    Req3
Identification              
BHCKK085        Green   Orange  Yellow  Red
BHDM1797        Violet  Green   Blue    Indigo
RCFD1797        Violet  Green   Blue    NaN
RCON1797        Violet  Green   Blue    Indigo
WRSS1797        Green   Violet  Blue    NaN

Hope this helps.

Upvotes: 4

Related Questions