Reputation: 7448
I have a df
A
that looks like,
type number date
1 1 2017-10-01
2 1 2017-10-31
1 2 2017-09-01
1 2 2017-08-01
2 2 2017-06-01
First I want to groupby number
and select groups that each group need to at least contain 1 row with type == 1
and one row with type == 2
; in order to do that I first check that the sizes of each group must be > 2,
g = A.groupby('number')
B = g.filter(lambda x: len(x) > 1) # df B gets the filter results on g
but I don't know how to also check type
values 1
and 2
are contained in every group acquired above.
Second, based on the groups acquired from step 1, I want to create a Boolean column in A
called type_2_before_type_1
, it is assigned True
if earliest date
for type 1
substracts earliest date
for type 2
is < 0
for each group, and False
otherwise. Since there could be many rows with type == 1
and many rows with type == 2
in each group (how to get g['A'].transform('min')
for type 1 and type 2 rows?). So the result df
A
looks like,
type number date type_2_before_type_1
1 1 2017-10-01 True
2 1 2017-10-31 True
1 2 2017-09-01 False
1 2 2017-08-01 False
2 2 2017-06-01 False
e.g. for the group with number == 2
, earliest date for type 1 2017-08-01
subtracts the earliest date for type 2 2017-06-01
, result in 61 days; hence set False
to type_2_before_type_1
.
Upvotes: 1
Views: 37
Reputation: 323236
g=df.groupby('number')
i=g.apply(lambda x : min(x['date'][x.type==1]))
j=g.apply(lambda x : min(x['date'][x.type==2]))
k=g.type.nunique()==2
df['New'] = df.number.map(~(i > j) & k)
df
Out[588]:
type number date New
0 1 1 2017-10-01 True
1 2 1 2017-10-31 True
2 1 2 2017-09-01 False
3 1 2 2017-08-01 False
4 2 2 2017-06-01 False
Upvotes: 2
Reputation: 402493
For the first part, use groupby
+ nunique
-
df.groupby('number').type.transform('nunique')
0 2
1 2
2 2
3 2
4 2
Name: type, dtype: int64
df = df[df.groupby('number').type.transform('nunique').gt(1)]
type number date
0 1 1 2017-10-01
1 2 1 2017-10-31
2 1 2 2017-09-01
3 1 2 2017-08-01
4 2 2 2017-06-01
The second part is a bit challenging, but you can groupby
, unstack
, call diff
and then get a mask -
df['type_2_before_type_1'] = \
df.groupby(['number', 'type'])\
.first()\
.unstack()\
.diff(-1, axis=1)\
.iloc[:, 0]\
.reindex(df.number)\
.astype(int)\
.lt(0)\
.values
df
type number date type_2_before_type_1
0 1 1 2017-10-01 True
1 2 1 2017-10-31 True
2 1 2 2017-09-01 False
3 1 2 2017-08-01 False
4 2 2 2017-06-01 False
Upvotes: 2