Reputation: 17934
I have a Pandas dataframe looking roughly like this (with dt*
being datetime objects):
A B C D
1 dt1 X 1 1
2 None Y 4 1
3 dt2 X 2 2
4 None Y 8 2
5 None X 3 3
6 None Y 2 3
7 dt3 X 7 4
8 dt3 Y 1 4
And I would like to convert it into such a shape:
A X y D
1 dt1 1 4 1
2 dt2 2 8 2
3 None 3 2 3
4 dt3 7 1 4
It is a given that the datetime objects are either identical, or one is None
.
I have tried using the following:
pd.pivot_table(table, index=["D"], columns=["B"], values=["C","A"], aggfunc=lambda x: ''.join(set(x)))
But this just makes the "A" column disappear.
Other variations on the above have also failed:
table = table.pivot(index="D", columns='B')
This one, for instance, creates a hierarchical column dataframe which I have been unable to collapse into the desired structure.
Can you help me out?
Upvotes: 1
Views: 361
Reputation: 29719
After computing the pivot_table
, join the resulting DF
with the original DF
that gets grouped by D to find the first occurence of A.
piv = df.pivot_table(index=['D'], columns=['B'])['C']
piv.join(df.groupby('D')['A'].first()).reset_index().rename_axis([None], axis=1)
(OR)
df.set_index(['D', 'B'])['C'].unstack(1).join(df.groupby('D')['A'].first()).reset_index()
Both yield:
Upvotes: 1
Reputation: 863281
You can use:
#sort values by column A
df = df.sort_values('A', ascending=False)
#replace string 'None' to None if necessary
df.A = df.A.replace({'None': None})
#groupby by column D and forward fill first value in each group
df.A = df.groupby('D', sort=False)['A'].ffill()
#reshape by set_index with unstack
df = df.set_index(['A','D', 'B'])['C'].unstack().reset_index()
#reorder columns, remove column name
df = df[['A','X','Y','D']].rename_axis(None, axis=1)
#if necessary replace NaN to None
df.A = df.A.replace({np.nan: None})
#sort by column D
df = df.sort_values('D')
print (df)
A X Y D
1 dt1 1 4 1
2 dt2 2 8 2
0 None 3 2 3
3 dt3 7 1 4
Upvotes: 1