the_tc
the_tc

Reputation: 41

How to convert the table in pandas?

I have the dataframe below:

df = pd.DataFrame(
    [
        ['A11', 'One', 'Person1', 'Yes'],
        ['A11', 'One', 'Person2', 'No'], 
        ['B22', 'Two', 'Person3', 'Yes'],
        ['B22', 'Two', 'Person1', 'No'],
        ['B22', 'Two', 'Person4', 'No'],
        ['C33', 'Three', 'Person5', 'Yes']
    ],
    columns=['Code', 'Name', 'Person', 'Valid']
)
Code Name Person Valid
0 A11 One Person1 Yes
1 A11 One Person2 No
2 B22 Two Person3 Yes
3 B22 Two Person1 No
4 B22 Two Person4 No
5 C33 Three Person5 Yes

What I would like to achieve is:

Code Name Person A Valid A Person B Valid B Person C Valid C
0 A11 One Person1 Yes Person2 No NaN NaN
1 B22 Two Person3 Yes Person1 No Person4 Yes
2 C33 Three Person5 Yes NaN NaN NaN NaN

Upvotes: 0

Views: 283

Answers (2)

Cimbali
Cimbali

Reputation: 11395

Here we go (this is pretty ugly I believe − there might be better out there),

  • using GroupBy + cumcount to create unique indexes for each row
  • using unstack to give the shape you want
  • then flattening the columns MultiIndex
>>> from string import ascii_uppercase
>>> person_id = df.groupby('Code').cumcount()
>>> person_id = person_id.map(dict(enumerate(ascii_uppercase)))
>>> unpivoted = df.set_index(['Code', person_id]).unstack(1)
>>> unpivoted
       Name             Person                   Valid          
          A    B    C        A        B        C     A    B    C
Code                                                            
A11     One  One  NaN  Person1  Person2      NaN   Yes   No  NaN
B22     Two  Two  Two  Person3  Person1  Person4   Yes   No   No
C33   Three  NaN  NaN  Person5      NaN      NaN   Yes  NaN  NaN
>>> unpivoted.columns = unpivoted.columns.map(lambda tup: f'{tup[0]} {tup[1]}')
>>> unpivoted.reset_index()
     Name A Name B Name C Person A Person B Person C Valid A Valid B Valid C
0  A11    One    One    NaN  Person1  Person2      NaN     Yes      No     NaN
1  B22    Two    Two    Two  Person3  Person1  Person4     Yes      No      No
2  C33  Three    NaN    NaN  Person5      NaN      NaN     Yes     NaN     NaN

If the order of columns is important to you, you could reorder them by swapping the levels and sorting the column multi index (before flattening the columns):

>>> unpivoted = unpivoted.reorder_levels([1, 0], axis='columns').sort_index(axis='columns')
>>> unpivoted.columns = unpivoted.columns.map(lambda tup: f'{tup[1]} {tup[0]}')
>>> unpivoted
     Name A Person A Valid A Name B Person B Valid B Name C Person C Valid C
Code                                                                        
A11     One  Person1     Yes    One  Person2      No    NaN      NaN     NaN
B22     Two  Person3     Yes    Two  Person1      No    Two  Person4      No
C33   Three  Person5     Yes    NaN      NaN     NaN    NaN      NaN     NaN

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195418

Try:

from string import ascii_uppercase

l = (
    df.groupby(["Code", "Name"])
    .agg(list)
    .apply(lambda x: list(zip(x["Person"], x["Valid"])), axis=1)
)

data = []
for a in l:
    data.append({})
    for i, (b, c) in zip(ascii_uppercase, a):
        data[-1].update({f"Person {i}": b, f"Valid {i}": c})

print(
    pd.concat([l.reset_index()[["Code", "Name"]], pd.DataFrame(data)], axis=1)
)

Prints:

  Code   Name Person A Valid A Person B Valid B Person C Valid C
0  A11    One  Person1     Yes  Person2      No      NaN     NaN
1  B22    Two  Person3     Yes  Person1      No  Person4      No
2  C33  Three  Person5     Yes      NaN     NaN      NaN     NaN

Upvotes: 2

Related Questions