Dawny33
Dawny33

Reputation: 11081

Pandas: Groupby and cut within a group

I have a pandas dataframe which looks like this:

userid   name       date
1           name1    2016-06-04
1           name2    2016-06-05
1           name3    2016-06-04
1           name1    2016-06-06
2           name23   2016-06-01
2           name2    2016-06-01
3           name1    2016-06-03
3           name6    2016-06-03
3           name12   2016-06-03
3           name65   2016-06-04

So, I want to retain only the rows of the users till the first date events, and cut the rest.

The final df would be as follows:

userid   name       date
1           name1    2016-06-04
1           name2    2016-06-04
2           name23   2016-06-01
2           name2    2016-06-01
3           name1    2016-06-03
3           name6    2016-06-03
3           name12   2016-06-03



userid     int64
name      object
time      object

The type() of data points in the time column is a datetime.date

So, the tasks would involve grouping with respect to userid, sorting according to the date, then retaining only the rows with first(/earliest) date.

Upvotes: 3

Views: 1372

Answers (1)

jezrael
jezrael

Reputation: 862891

You can first sort DataFrame by column date by sort_values and then groupby with apply boolean indexing - get all rows where is first values:

df = df.sort_values('date')
       .groupby('userid')
       .apply(lambda x: x[x.date == x.date.iloc[0]])
       .reset_index(drop=True)

print (df)
   userid    name       date
0       1   name1 2016-06-04
1       1   name3 2016-06-04
2       2  name23 2016-06-01
3       2   name2 2016-06-01
4       3   name1 2016-06-03
5       3   name6 2016-06-03
6       3  name12 2016-06-03

Upvotes: 3

Related Questions