Reputation: 11081
I have a pandas dataframe which looks like this:
userid name date
1 name1 2016-06-04
1 name2 2016-06-05
1 name3 2016-06-04
1 name1 2016-06-06
2 name23 2016-06-01
2 name2 2016-06-01
3 name1 2016-06-03
3 name6 2016-06-03
3 name12 2016-06-03
3 name65 2016-06-04
So, I want to retain only the rows of the users till the first date events, and cut the rest.
The final df would be as follows:
userid name date
1 name1 2016-06-04
1 name2 2016-06-04
2 name23 2016-06-01
2 name2 2016-06-01
3 name1 2016-06-03
3 name6 2016-06-03
3 name12 2016-06-03
userid int64
name object
time object
The type()
of data points in the time column is a datetime.date
So, the tasks would involve grouping with respect to userid
, sorting according to the date
, then retaining only the rows with first(/earliest) date
.
Upvotes: 3
Views: 1372
Reputation: 862891
You can first sort DataFrame
by column date
by sort_values
and then groupby
with apply
boolean indexing
- get all rows where is first values:
df = df.sort_values('date')
.groupby('userid')
.apply(lambda x: x[x.date == x.date.iloc[0]])
.reset_index(drop=True)
print (df)
userid name date
0 1 name1 2016-06-04
1 1 name3 2016-06-04
2 2 name23 2016-06-01
3 2 name2 2016-06-01
4 3 name1 2016-06-03
5 3 name6 2016-06-03
6 3 name12 2016-06-03
Upvotes: 3