Reputation: 2401
This question is closely related to the one I asked here, but it is not same.
My question is, suppose I have stacked data and I want pivot this table by Date and Name. My question is, when there are ambiguous values (more than one value for the same Date and Name) how can I choose between them ?
Name Date Value
a 2016-05-01 5
b 2016-05-01 7
c 2016-05-01 7
c 2016-05-01 8
a 2016-06-01 4
b 2016-06-01 3
c 2016-06-01 2
d 2016-06-01 2
Ideally I would like to "choose" the final value based on another column. For example suppose my dataset is:
Name Date Value PublishDate
a 2016-05-01 5 2016-04-01
b 2016-05-01 7 2016-04-01
c 2016-05-01 7 2016-03-01
c 2016-05-01 8 2016-04-01
a 2016-06-01 4 2016-04-01
b 2016-06-01 3 2016-04-01
c 2016-06-01 2 2016-04-01
d 2016-06-01 2 2016-04-01
Then, between the [c,2016-05-01] duplicates, I'd like the one that was published last to stay (PublishDate=2016-04-01) and the other one to be discarded.
Is is possible ?
Ultimately I need this output, so that I can perform matrix operations on it.
Data/Name a b c d
2016-05-01 5 7 8 NaN
2016-06-01 4 3 2 2
I took a look at this question, but it is not quite the same, since I want one value to be discarded based on kind of a "rule".
But that question gives the insight that tere might be some use of the pandas.pivot_table function to help acomplish it. But I couldn't find it.
Thanks !
Upvotes: 0
Views: 167
Reputation: 2401
Based on @dmb 's answer I came up with the following way of coping with the issues that I've stated on the comments:
df = df.sort_values(by='PublishDate').copy()
df = df.pivot_table(index = 'Date',columns = 'Stock',values = 'Value', aggfunc = 'last')
Upvotes: 0
Reputation: 1719
I would sort by PublishDate
, groupby
what you want using the last
aggregate function. It will return the last row (latest publish date for you) for each 'Name'.
df = df.sort_index(by='PublishDate').copy()
df.reset_index(inplace=True)
grouped_df = df.groupby('Name').last()
Upvotes: 1