Reputation: 2736
I'm having trouble pivoting/reshaping my dataframe. After a melt()
operation, I have the following df:
|----|------|------|------------|
| ID | Rank | Var | Val |
|----|------|------|------------|
| 1 | 1 | date | 2020-01-01 |
|----|------|------|------------|
| 1 | 2 | date | 2020-02-01 |
|----|------|------|------------|
| 2 | 1 | date | 2020-01-01 |
|----|------|------|------------|
| 2 | 2 | date | 2020-02-01 |
|----|------|------|------------|
| 1 | 1 | amt | 320 |
|----|------|------|------------|
| 1 | 2 | amt | 480 |
|----|------|------|------------|
| 2 | 1 | amt | 620 |
|----|------|------|------------|
| 2 | 2 | amt | 400 |
|----|------|------|------------|
The result I am looking for would be to convert the rank values into columns:
|----|------|------------|------------|
| ID | Var | 1 | 2 |
|----|------|------------|------------|
| 1 | date | 2020-01-01 | 2020-02-01 |
| |------|------------|------------|
| | amt | 320 | 480 |
|----|------|------------|------------|
| 2 | date | 2020-01-01 | 2020-02-01 |
| |------|------------|------------|
| | amt | 620 | 400 |
|----|------|------------|------------|
Immediately trying to unstack(level='Rank')
results in a
ValueError: Index contains duplicate entries, cannot reshape
Ok, we do have duplicate IDs
, let's groupby
to consolidate:
df.set_index(['ID', 'Rank']).groupby(['ID', 'Rank']).apply(lambda x: x)
The apply()
is just so we can return a dataframe and preview the results, which are:
|----|------|------|------------|
| ID | Rank | Var | Val |
|----|------|------|------------|
| 1 | 1 | date | 2020-01-01 |
| |------|------|------------|
| | 1 | amt | 320 |
| |------|------|------------|
| | 2 | date | 2020-02-01 |
| |------|------|------------|
| | 2 | amt | 480 |
|----|------|------|------------|
| 2 | 1 | date | 2020-01-01 |
| |------|------|------------|
| | 2 | date | 2020-02-01 |
| |------|------|------------|
| | 1 | amt | 620 |
| |------|------|------------|
| | 2 | amt | 400 |
|----|------|------|------------|
This is closer. Now we just need to convert Rank into cols:
df.set_index(['ID', 'Rank']).groupby(['ID', 'Rank']).apply(lambda x: x).unstack(level='Rank')
ValueError: Index contains duplicate entries, cannot reshape
????
I also tried using pivot_table
, but not all of the columns are numeric (the above example is simplified, but you can see date in here, which throws off pivot_table).
I seem to be just one method away, but I've tried unstacking, reindexing, pivoting, explode()
-ing, I just can't figure out this last part...
How can I get the desired dataframe?
Thank you!
Upvotes: 0
Views: 2034
Reputation: 2409
your issue is that the index keys must be unique to unstack.
first, let's recreate this toy example
import pandas as pd
from IPython.display import display
df= pd.DataFrame(columns=['ID', 'Rank','Var', 'Val'], data=[
[1,1,'date', '2020-01-01'],
[1,2,'date','2020-02-01'],
[2,1,'date','2020-02-01'],
[2,2,'date','2020-02-01'],
[1,1,'amt',320],
[1,2,'amt',480],
[2,1,'amt',620],
[2,2,'amt',400],
])
df = df.set_index(['ID', 'Rank'])
display(df)
now, lets solve
lets put unique keys in the index, namely the ID
and Var
fields.
now unstacking is trivial
df.set_index('Var', append=True).unstack('Rank')
Upvotes: 1