konichiwa
konichiwa

Reputation: 551

How to combine (comma-separated) row values in a single column in pandas?

I have a pandas data frame. How can I convert the first data frame into the second one?

It tried the following but it keeps throwing Index contains duplicate entries, cannot reshape res_df = book_df.pivot(index='book_id', columns='field', values='field_value') I think this happens because book_id=1 has multiple title values (X and Y). I would like to comma separate these cases.

Input

| id | book_id | field  | field_value  |
|----|---------|--------|--------------|
| 1  | 1       | title  | X            |
| 2  | 1       | title  | Y            |
| 3  | 1       | bsn    | 999          |
| 4  | 2       | title  | Harry Potter |
| 5  | 3       | title  | Hello World  |
| 6  | 3       | author | John Doe     |

Expected output

| id | book_id | title        | bsn | author   |
|----|---------|--------------|-----|----------|
| 1  | 1       | X,Y          | 999 |          |
| 2  | 2       | Harry Potter |     |          |
| 3  | 3       | Hello World  |     | John Doe |

Upvotes: 0

Views: 610

Answers (1)

Michael Szczesny
Michael Szczesny

Reputation: 5026

Your pivot was nearly correct. I used pivot_table and added a string join with the aggfunc argument

(book_df.pivot_table(index='book_id', columns='field', values='field_value', aggfunc=','.join, fill_value='')
  .reset_index()
  .rename_axis(None, axis=1)[['book_id','title','bsn','author']])

Out:

   book_id         title  bsn    author
0        1           X,Y  999          
1        2  Harry Potter               
2        3   Hello World       John Doe

Upvotes: 1

Related Questions