aviss
aviss

Reputation: 2439

Pandas - group, sort and keep first row

I have a dataframe I need to group by id, then sort by time and keep only the first record for each id. Tried different methods suggested in other answers but couldn't get the right result. Will be grateful for your help!

test = pd.DataFrame({'id' : [1,1,1,
                           2,2,
                           3,3,3,3],
                   'ref'  : ["search","social","direct",
                          "social","direct",
                          "direct",'social','social','social'],
                   'media':['video', 'page', 'video',
                           'page', 'pic', 
                            'pic', 'video', 'page', 'video'],
                   'time': ['2019-04-10 19:00:00.569', '2019-04-10 00:10:00.569', '2019-04-10 00:10:20.569',
                          '2019-04-10 12:10:00.569','2019-04-10 11:10:00.569',
                          '2019-04-10 22:10:00.569','2019-04-10 14:10:00.569','2019-04-10 14:30:00.569','2019-04-10 15:10:00.569']})

Expected result:

    id  ref     media
0   1   social  page
1   2   direct  pic
2   3   social  video

Upvotes: 9

Views: 4022

Answers (1)

cs95
cs95

Reputation: 402323

You can sort and then drop duplicates:

test.sort_values(by=['id', 'time']).drop_duplicates('id').drop('time',1)

   id     ref  media
1   1  social   page
4   2  direct    pic
6   3  social  video

Upvotes: 9

Related Questions