Reputation: 34086
Below is my df:
df = pd.DataFrame({'A': [1, 1, 1, 2],
'B': [2, 2, 2, 3],
'C': [3, 3, 3, 4],
'D': ['Cancer A', 'Cancer B', 'Cancer A', 'Cancer B'],
'E': ['Ecog 9', 'Ecog 1', 'Ecog 0', 'Ecog 1'],
'F': ['val 6', 'val 1', 'val 0', 'val 1'],
'measure_m': [100, 200, 500, 300]})
print(df)
A B C D E F measure_m
0 1 2 3 Cancer A Ecog 9 val 6 100
1 1 2 3 Cancer B Ecog 1 val 1 200
2 1 2 3 Cancer A Ecog 0 val 0 500
3 2 3 4 Cancer B Ecog 1 val 1 300
When I pivot
this df without passing the index, I get this:
In [1280]: df.pivot(index=None, columns = ['A', 'B', 'C', 'D', 'E', 'F'])
Out[1280]:
measure_m
A 1 2
B 2 3
C 3 4
D Cancer A Cancer B Cancer A Cancer B
E Ecog 9 Ecog 1 Ecog 0 Ecog 1
F val 6 val 1 val 0 val 1
0 100.0 NaN NaN NaN
1 NaN 200.0 NaN NaN
2 NaN NaN 500.0 NaN
3 NaN NaN NaN 300.0
I want instead of 4 rows
just 1
single row with all values of measure_m
column, like below:
measure_m
A 1 2
B 2 3
C 3 4
D Cancer A Cancer B Cancer A Cancer B
E Ecog 9 Ecog 1 Ecog 0 Ecog 1
F val 6 val 1 val 0 val 1
0 100.0 200.0 500.0 300.0
How to go about this?
Upvotes: 3
Views: 364
Reputation: 150785
Do you mean:
df.set_index(list(df.columns[:-1])).T
Output:
A 1 2
B 2 3
C 3 4
D Cancer A Cancer B Cancer A Cancer B
E Ecog 9 Ecog 1 Ecog 0 Ecog 1
F val 6 val 1 val 0 val 1
measure_m 100 200 500 300
Update a little modification to match your output:
cols = ['A', 'B', 'C', 'D', 'E', 'F']
(df.set_index(cols)
[['measure_m']] # only need this if you have more columns
.unstack(level=cols)
.to_frame().T
)
Output:
measure_m
A 1 2
B 2 3
C 3 4
D Cancer A Cancer B Cancer A Cancer B
E Ecog 9 Ecog 1 Ecog 0 Ecog 1
F val 6 val 1 val 0 val 1
0 100 200 500 300
Upvotes: 4