Reputation: 1185
people1 trait1 YES
people1 trait2 YES
people1 trait3 NO
people1 trait4 RED
people2 trait1 NO
people2 trait2 YES
people2 trait4 BLACK
etc..
It's possible to create from that table something like this?
trait1, trait2, trait3, trait4 ...
people1 YES YES NO RED
people2 NO YES - BLACK
people3 - - YES BLUE
The file is too big to do that in excel, I tried in pandas, but I can't find help in this case. I found pd.pivot_table funcion but I can't build working code. I tried and got various erors (99% my fault).
Can someone explain me how to use it in my case? Or maybe is better option than pandas.pivot?+
EDIT
I rebuild my frame:
1 'interpretation' 'trait'
p1 YES t1
p1 BLACK t2
p1 NO t3
p2 NO t1
p2 RED t2
p2 NO t3
And I use suggestion:
data1.pivot_table(index=1, columns="name", values='trait', aggfunc=','.join, fill_value='-').
And I got:
TypeError: sequence item 0: expected str instance, float found
If I change
data1.pivot_table(index=1, columns="trait", values='value', aggfunc=','.join, fill_value='-').
I got bad order table but without error:
p1 p2 p3 p4
YES trait1 t1
YES t1 t2 etc.
NO
RED
No
...
So i think, the first option is correct, but I cant repair that error. When I dtype df it return (O) for all cols.
Upvotes: 1
Views: 64
Reputation: 862511
I think problem is missing values in column trait
, so join
function failed. So possible solution is replace missing values to empty strings:
print (data1)
1 name trait
0 p1 YES NaN <- missing value
1 p1 BLACK t2
2 p1 NO t3
3 p2 NO t1
4 p2 RED t2
5 p2 NO t3
data1['trait'] = data1['trait'].fillna('')
df = data1.pivot_table(index=1,
columns="name",
values='trait',
aggfunc=','.join,
fill_value='-')
print (df)
1 p1 p2
name
BLACK t2 -
NO t3 t1,t3
RED - t2
YES -
Also if want convert index to column:
data1['trait'] = data1['trait'].fillna('')
df = (data1.pivot_table(index=1,
columns="name",
values='trait',
aggfunc=','.join,
fill_value='-')
.reset_index()
.rename_axis(None, axis=1))
print (df)
name p1 p2
0 BLACK t2 -
1 NO t3 t1,t3
2 RED - t2
3 YES -
Upvotes: 1