Reputation: 41
I have the dataframe below:
df = pd.DataFrame(
[
['A11', 'One', 'Person1', 'Yes'],
['A11', 'One', 'Person2', 'No'],
['B22', 'Two', 'Person3', 'Yes'],
['B22', 'Two', 'Person1', 'No'],
['B22', 'Two', 'Person4', 'No'],
['C33', 'Three', 'Person5', 'Yes']
],
columns=['Code', 'Name', 'Person', 'Valid']
)
Code | Name | Person | Valid | |
---|---|---|---|---|
0 | A11 | One | Person1 | Yes |
1 | A11 | One | Person2 | No |
2 | B22 | Two | Person3 | Yes |
3 | B22 | Two | Person1 | No |
4 | B22 | Two | Person4 | No |
5 | C33 | Three | Person5 | Yes |
What I would like to achieve is:
Code | Name | Person A | Valid A | Person B | Valid B | Person C | Valid C | |
---|---|---|---|---|---|---|---|---|
0 | A11 | One | Person1 | Yes | Person2 | No | NaN | NaN |
1 | B22 | Two | Person3 | Yes | Person1 | No | Person4 | Yes |
2 | C33 | Three | Person5 | Yes | NaN | NaN | NaN | NaN |
Upvotes: 0
Views: 283
Reputation: 11395
Here we go (this is pretty ugly I believe − there might be better out there),
GroupBy
+ cumcount
to create unique indexes for each rowunstack
to give the shape you want>>> from string import ascii_uppercase
>>> person_id = df.groupby('Code').cumcount()
>>> person_id = person_id.map(dict(enumerate(ascii_uppercase)))
>>> unpivoted = df.set_index(['Code', person_id]).unstack(1)
>>> unpivoted
Name Person Valid
A B C A B C A B C
Code
A11 One One NaN Person1 Person2 NaN Yes No NaN
B22 Two Two Two Person3 Person1 Person4 Yes No No
C33 Three NaN NaN Person5 NaN NaN Yes NaN NaN
>>> unpivoted.columns = unpivoted.columns.map(lambda tup: f'{tup[0]} {tup[1]}')
>>> unpivoted.reset_index()
Name A Name B Name C Person A Person B Person C Valid A Valid B Valid C
0 A11 One One NaN Person1 Person2 NaN Yes No NaN
1 B22 Two Two Two Person3 Person1 Person4 Yes No No
2 C33 Three NaN NaN Person5 NaN NaN Yes NaN NaN
If the order of columns is important to you, you could reorder them by swapping the levels and sorting the column multi index (before flattening the columns):
>>> unpivoted = unpivoted.reorder_levels([1, 0], axis='columns').sort_index(axis='columns')
>>> unpivoted.columns = unpivoted.columns.map(lambda tup: f'{tup[1]} {tup[0]}')
>>> unpivoted
Name A Person A Valid A Name B Person B Valid B Name C Person C Valid C
Code
A11 One Person1 Yes One Person2 No NaN NaN NaN
B22 Two Person3 Yes Two Person1 No Two Person4 No
C33 Three Person5 Yes NaN NaN NaN NaN NaN NaN
Upvotes: 0
Reputation: 195418
Try:
from string import ascii_uppercase
l = (
df.groupby(["Code", "Name"])
.agg(list)
.apply(lambda x: list(zip(x["Person"], x["Valid"])), axis=1)
)
data = []
for a in l:
data.append({})
for i, (b, c) in zip(ascii_uppercase, a):
data[-1].update({f"Person {i}": b, f"Valid {i}": c})
print(
pd.concat([l.reset_index()[["Code", "Name"]], pd.DataFrame(data)], axis=1)
)
Prints:
Code Name Person A Valid A Person B Valid B Person C Valid C
0 A11 One Person1 Yes Person2 No NaN NaN
1 B22 Two Person3 Yes Person1 No Person4 No
2 C33 Three Person5 Yes NaN NaN NaN NaN
Upvotes: 2