Reputation: 8628
I have the following dataframe:
df =
ID TYPE VD_0 VD_1 VD_2 VD_3
1 ABC V1234 456 123 564
2 DBC 456 A45 123 564
3 ABD 456 V1234 456 123
4 ABD 123 V1234 SSW 123
There is the following list of values of VD_0
, VD_1
, VD_2
and VD_3
:
myList = [V1234,456,A45]
I want to get only those rows in df
that have 2 sequencial occurances of values from myList
in columns VD_0
, VD_1
, VD_2
and VD_3
.
The result is this one:
result =
ID TYPE VD_0 VD_1 VD_2 VD_3
1 ABC V1234 456 123 564
2 DBC 456 A45 123 564
3 ABD 456 V1234 456 123
For example, in row with ID
1 the values of VD_0
and VD_1
are equal to V1234
and 456
, correspondingly, and both of these values belong to myList
. The same logic is applied to rows with ID
2 (456
,A45
) and 3 (456
,V1234
).
How can I do it?
Upvotes: 4
Views: 84
Reputation: 32105
I agree with the beginning of MaxU's answer, yet, the end should be easier IIUC. The filter you want should get 2 consecutive matches from your list. You can get this answer by saying you want the row by row sum of isin result being at least a value of 2 if you sum them two by two. This is called a 2-period rolling window sum along axis=1. Then you take the max value of each row and the matches have a value greater or equal then 2:
subset = df.filter(like='VD_')
df[subset.isin(myList).rolling(2, axis=1).sum().max(axis=1)>=2]
Out[26]:
ID TYPE VD_0 VD_1 VD_2 VD_3
0 1 ABC V1234 456 123 564
1 2 DBC 456 A45 123 564
2 3 ABD 456 V1234 456 123
Upvotes: 2
Reputation: 210882
Try this:
In [112]: subset = df.filter(like='VD_')
In [113]: df[subset[subset.isin(myList)].stack().duplicated().unstack().any(1)]
Out[113]:
ID TYPE VD_0 VD_1 VD_2 VD_3
1 2 DBC 456 A45 123 564
2 3 ABD 456 V1234 456 123
3 4 ABD 123 V1234 SSW 123
Explanations:
In [114]: subset
Out[114]:
VD_0 VD_1 VD_2 VD_3
0 V1234 456 123 564
1 456 A45 123 564
2 456 V1234 456 123
3 123 V1234 SSW 123
In [115]: subset.isin(myList)
Out[115]:
VD_0 VD_1 VD_2 VD_3
0 True True False False
1 True True False False
2 True True True False
3 False True False False
In [116]: subset[subset.isin(myList)]
Out[116]:
VD_0 VD_1 VD_2 VD_3
0 V1234 456 NaN NaN
1 456 A45 NaN NaN
2 456 V1234 456 NaN
3 NaN V1234 NaN NaN
In [118]: subset[subset.isin(myList)].stack()
Out[118]:
0 VD_0 V1234
VD_1 456
1 VD_0 456
VD_1 A45
2 VD_0 456
VD_1 V1234
VD_2 456
3 VD_1 V1234
dtype: object
In [119]: subset[subset.isin(myList)].stack().duplicated()
Out[119]:
0 VD_0 False
VD_1 False
1 VD_0 True
VD_1 False
2 VD_0 True
VD_1 True
VD_2 True
3 VD_1 True
dtype: bool
In [120]: subset[subset.isin(myList)].stack().duplicated().unstack()
Out[120]:
VD_0 VD_1 VD_2
0 False False None
1 True False None
2 True True True
3 None True None
In [121]: subset[subset.isin(myList)].stack().duplicated().unstack().any(1)
Out[121]:
0 False
1 True
2 True
3 True
dtype: bool
Upvotes: 1