Reputation: 2647
Suppose I have the following DataFrame:
>>> cols = ['model', 'parameter', 'condition', 'value']
>>> df = pd.DataFrame([['BMW', '0-60', 'rain', '7'], ['BMW', '0-60', 'sun', '7'],
['BMW','mpg', 'rain','25'],
['BMW', 'stars', 'rain','5'],
['Toyota', '0-60', 'rain','9'],
['Toyota','mpg', 'rain','40'],
['Toyota', 'stars', 'rain','4']], columns=cols)
>>> df
model parameter condition value
0 BMW 0-60 rain 7
1 BMW 0-60 sun 7
2 BMW mpg rain 25
3 BMW stars rain 5
4 Toyota 0-60 rain 9
5 Toyota mpg rain 40
6 Toyota stars rain 4
This is a list of performance metrics for various cars at different conditions. This is a made up data set, of course, but its representative of my problem.
What I ultimately want is to have observation for a given condition on its own row, and each metric on its own column. This would look something like this:
parameter condition 0-60 mpg stars
model
0 BMW rain 7 25 5
1 BMW sun 7 NaN NaN
2 Toyota rain 9 40 4
Note that I just made up the format above. I don't know if Pandas would generate something exactly like that, but that's the general idea. I would also of course transform the "condition" into a Boolean array and fill in the NaNs.
My problem is that when I try to use the pivot method I get an error. I think this is because my "column" key is repeated (because I have BMW 0-60 stats for the rain and for the sun conditions).
df.pivot(index='model',columns='parameter')
ValueError: Index contains duplicate entries, cannot reshape
Does anyone know of a slick way to do this? I'm finding a lot of these Pandas reshaping methods to be quite obtuse.
Upvotes: 0
Views: 1077
Reputation: 176770
You can get the result you want using pivot_table
and passing the following parameters:
>>> df.pivot_table(index=['model', 'condition'], values='value', columns='parameter')
parameter 0-60 mpg stars
model condition
BMW rain 7 25 5
sun 7 NaN NaN
Toyota rain 9 40 4
(You may need to ensure the "value" column has numeric types first or else you can pass aggfunc=lambda x: x
in the pivot_table
function to get around this requirement.)
Upvotes: 0
Reputation: 19104
You can just change the index and unstack it...
df.set_index(['model', 'condition', 'parameter']).unstack()
returns
value
parameter 0-60 mpg stars
model condition
BMW rain 7 25 5
sun 7 NaN NaN
Toyota rain 9 40 4
Upvotes: 1