lorelai
lorelai

Reputation: 83

pandas rows as column names - reshaping dataframe

I have table in the following format:

  user_id   total_spent purchase_date   purchase_id aisle   department  item_cost
    33      279.3       1/29/18         26          gn      1           25.7
    33      279.3       1/29/18         26          aj      2           17
    33      279.3       1/29/18         26          bs      2           31.7
    33      279.3       1/29/18         26          jt      3           30
    33      279.3       1/29/18         26          tj      3           8.1
    33      279.3       1/29/18         26          ga      4           60.6
    33      279.3       1/29/18         26          tw      4           35.9
    33      279.3       1/29/18         26          aw      5           34.1
    33      279.3       1/29/18         26          te      5           36.2
    26      331.7       11/15/17        85          je      1           93.5
    26      331.7       11/15/17        85          tg      2           21.1
    26      331.7       11/15/17        85          ds      2           23.2
    26      331.7       11/15/17        85          jw      3           48.8
    26      331.7       11/15/17        85          df      3           10.2
    26      331.7       11/15/17        85          lj      4           56.2
    26      331.7       11/15/17        85          oa      4           7
    26      331.7       11/15/17        85          aw      5           32.4
    26      331.7       11/15/17        85          cm      5           39.3

And would like to convert it into each department value as a column name like department_1, department_2 etc. Same as aisle and item cost: dept_aisle_gn, dept_aisle_df etc followed by corresponding cost as columns.

In other words, in the end, I would like to have 2 rows for 2 user_ids and the rest of the information will be shown in the columns rather than rows.

Originally I pulled the data from redshift and I will manipulate it with python pandas. Unfortunately, pivot or crosstab is not working for redshift. I tried pandas pivot but seems not working for me.

Thank you.

Upvotes: 0

Views: 119

Answers (1)

jpp
jpp

Reputation: 164623

pd.pivot_table supports this.

pd.pivot_table(df,
               index=['user_id', 'total_spent', 'purchase_date', 'purchase_id'],
               columns=['aisle', 'department'],
               values='item_cost',
               aggfunc=np.sum,
               fill_value=0)

# aisle                                          aj    aw    bs    cm    df  \
# department                                      2     5     2     5     3   
# user_id total_spent purchase_date purchase_id                               
# 26      331.7       11/15/17      85            0  32.4   0.0  39.3  10.2   
# 33      279.3       1/29/18       26           17  34.1  31.7   0.0   0.0   

# aisle                                            ds    ga    gn    je  jt  \
# department                                        2     4     1     1   3   
# user_id total_spent purchase_date purchase_id                               
# 26      331.7       11/15/17      85           23.2   0.0   0.0  93.5   0   
# 33      279.3       1/29/18       26            0.0  60.6  25.7   0.0  30   

# aisle                                            jw    lj oa    te    tg   tj  \
# department                                        3     4  4     5     2    3   
# user_id total_spent purchase_date purchase_id                                   
# 26      331.7       11/15/17      85           48.8  56.2  7   0.0  21.1  0.0   
# 33      279.3       1/29/18       26            0.0   0.0  0  36.2   0.0  8.1   

# aisle                                            tw  
# department                                        4  
# user_id total_spent purchase_date purchase_id        
# 26      331.7       11/15/17      85            0.0  
# 33      279.3       1/29/18       26           35.9    

Upvotes: 1

Related Questions