Reputation: 35
I have a DataFrame that contains a column with dates which I'd like to use as my DataFrame's index. The dates in that column are not necessarily unique - sometimes there might be duplicates. I wish to append duplicates as new columns. Dates that are unique can just contain NaN (or whatever) for the newly appended columns.
To clarify I'll provide an example:
import pandas as pd
data = [
{'Date':'01-01-2020','A':0,'B':1},
{'Date':'01-01-2020','A':2,'B':3},
{'Date':'02-01-2020','A':4,'B':5},
{'Date':'02-01-2020','A':6,'B':7},
{'Date':'03-01-2020','A':8,'B':9},
]
df = pd.DataFrame(data)
This will yield:
Date A B
0 01-01-2020 0 1
1 01-01-2020 2 3
2 02-01-2020 4 5
3 02-01-2020 6 7
4 03-01-2020 8 9
What I want:
A B C D
Date
01-01-2020 0 1 2 3
02-01-2020 4 5 6 7
03-01-2020 8 9 NaN NaN
The naming of the newly appended columns can be arbitrary. I don't even know whether appending would be the right way to go about it. Maybe it's easier to create a new DataFrame from scratch.
Upvotes: 2
Views: 107
Reputation: 862511
Use DataFrame.set_index
with DataFrame.stack
for unpivot data and then pivoting by GroupBy.cumcount
and Series.unstack
:
df1 = df.set_index('Date').stack().reset_index(name='val')
df = df1.set_index(['Date', df1.groupby('Date').cumcount()])['val'].unstack()
print (df)
0 1 2 3
Date
01-01-2020 0.0 1.0 2.0 3.0
02-01-2020 4.0 5.0 6.0 7.0
03-01-2020 8.0 9.0 NaN NaN
If order of output values is not important:
df1 = df.melt('Date')
df = df1.set_index(['Date', df1.groupby('Date').cumcount()])['value'].unstack()
print (df)
0 1 2 3
Date
01-01-2020 0.0 2.0 1.0 3.0
02-01-2020 4.0 6.0 5.0 7.0
03-01-2020 8.0 9.0 NaN NaN
Another idea is use lambda function for reshape:
df = (df.set_index('Date')
.groupby(level=0)
.apply(lambda x: pd.Series(x.to_numpy().ravel()))
.unstack())
print (df)
0 1 2 3
Date
01-01-2020 0.0 1.0 2.0 3.0
02-01-2020 4.0 5.0 6.0 7.0
03-01-2020 8.0 9.0 NaN NaN
Upvotes: 1