Reputation: 551
I have a pandas data frame. How can I convert the first data frame into the second one?
It tried the following but it keeps throwing Index contains duplicate entries, cannot reshape
res_df = book_df.pivot(index='book_id', columns='field', values='field_value')
I think this happens because book_id=1
has multiple title
values (X and Y). I would like to comma separate these cases.
Input
| id | book_id | field | field_value |
|----|---------|--------|--------------|
| 1 | 1 | title | X |
| 2 | 1 | title | Y |
| 3 | 1 | bsn | 999 |
| 4 | 2 | title | Harry Potter |
| 5 | 3 | title | Hello World |
| 6 | 3 | author | John Doe |
Expected output
| id | book_id | title | bsn | author |
|----|---------|--------------|-----|----------|
| 1 | 1 | X,Y | 999 | |
| 2 | 2 | Harry Potter | | |
| 3 | 3 | Hello World | | John Doe |
Upvotes: 0
Views: 610
Reputation: 5026
Your pivot was nearly correct. I used pivot_table
and added a string join with the aggfunc
argument
(book_df.pivot_table(index='book_id', columns='field', values='field_value', aggfunc=','.join, fill_value='')
.reset_index()
.rename_axis(None, axis=1)[['book_id','title','bsn','author']])
Out:
book_id title bsn author
0 1 X,Y 999
1 2 Harry Potter
2 3 Hello World John Doe
Upvotes: 1