pivot data frame in pandas?

Question

people1 trait1 YES
people1 trait2 YES
people1 trait3 NO
people1 trait4 RED
people2 trait1 NO
people2 trait2 YES
people2 trait4 BLACK

etc..

It's possible to create from that table something like this?

        trait1, trait2, trait3, trait4 ...
people1  YES     YES     NO      RED
people2  NO      YES     -       BLACK
people3  -        -      YES     BLUE

The file is too big to do that in excel, I tried in pandas, but I can't find help in this case. I found pd.pivot_table funcion but I can't build working code. I tried and got various erors (99% my fault).

Can someone explain me how to use it in my case? Or maybe is better option than pandas.pivot?+

EDIT

I rebuild my frame:
1      'interpretation'     'trait'
p1           YES               t1
p1           BLACK             t2
p1           NO                t3
p2           NO                t1
p2           RED               t2
p2           NO                t3

And I use suggestion:

data1.pivot_table(index=1, columns="name", values='trait', aggfunc=','.join, fill_value='-').

And I got:

TypeError: sequence item 0: expected str instance, float found

If I change

data1.pivot_table(index=1, columns="trait", values='value', aggfunc=','.join, fill_value='-').

I got bad order table but without error:

     p1      p2    p3    p4
YES  trait1  t1
YES  t1      t2 etc.
NO
RED
No
...

So i think, the first option is correct, but I cant repair that error. When I dtype df it return (O) for all cols.

jezrael · Accepted Answer

I think problem is missing values in column trait, so join function failed. So possible solution is replace missing values to empty strings:

print (data1)
    1   name trait
0  p1    YES   NaN <- missing value
1  p1  BLACK    t2
2  p1     NO    t3
3  p2     NO    t1
4  p2    RED    t2
5  p2     NO    t3

data1['trait'] = data1['trait'].fillna('')
df = data1.pivot_table(index=1, 
                       columns="name", 
                       values='trait', 
                       aggfunc=','.join, 
                       fill_value='-')
print (df)
1      p1     p2
name            
BLACK  t2      -
NO     t3  t1,t3
RED     -     t2
YES            -

Also if want convert index to column:

data1['trait'] = data1['trait'].fillna('')
df = (data1.pivot_table(index=1, 
                       columns="name", 
                       values='trait', 
                       aggfunc=','.join, 
                       fill_value='-')
           .reset_index()
           .rename_axis(None, axis=1))
print (df)
    name  p1     p2
0  BLACK  t2      -
1     NO  t3  t1,t3
2    RED   -     t2
3    YES          -

pivot data frame in pandas?

Answers (1)

Related Questions