dsal1951
dsal1951

Reputation: 1750

Pandas - Only Pivot Select Rows

I have a table where two different types of columns have been stacked into the field column - attributes and questions.

+-------+------------+-------+
|  id   |   field    | value |
+-------+------------+-------+
| 52394 | gender     | M     |
| 52394 | age        | 24    |
| 52394 | question_1 | 2     |
| 52394 | question_2 | 1     |
+-------+------------+-------+

I want to reshape it so that gender and age become columns while question_1 and question_2 remain stacked.

+-------+--------+-----+------------+-------+
|  id   | gender | age |   field    | value |
+-------+--------+-----+------------+-------+
| 52394 | M      |  24 | question_1 |     2 |
| 52394 | M      |  24 | question_2 |     1 |
+-------+--------+-----+------------+-------+

Any ideas on how to do this?

Upvotes: 2

Views: 324

Answers (1)

FatihAkici
FatihAkici

Reputation: 5109

This would be my strategy:

Apply pivot to your df where field is gender or age, save as df1. Select the df where field is not gender or age, save as df2. Then merge the two (df1 and df2) on id. Here is my full code:

import pandas as pd
import sys
if sys.version_info[0] < 3:
    from StringIO import StringIO
else:
    from io import StringIO

# Create df
rawText = StringIO("""
  id     field     value 
 52394  gender      M     
 52394  age         24    
 52394  question_1  2     
 52394  question_2  1     
""")
df = pd.read_csv(rawText, sep = "\s+")
df1 = df[df['field'].isin(['gender','age'])]
df1 = df1.pivot(index = 'id', columns = 'field', values = 'value').reset_index()
df2 = df[~df['field'].isin(['gender','age'])]
df1.merge(df2)

The result is:

      id age gender       field value
0  52394  24      M  question_1     2
1  52394  24      M  question_2     1

Upvotes: 3

Related Questions