Reputation: 109
I have a data frame with a 3 columns and several hundred rows. A particular column contains one of three strings: "Open","Close","Cancel"
type unique_id group
1 Open 11468329881 g_2
2 Close 11468329881 g_2
3 Open 23254429881 g_3
4 Cancel 23254429881 g_3
5 Open 32550829881 g_4
6 Close 32550829881 g_4
7 Open 43254429881 g_5
8 Close 43254429881 g_5
9 Open 52627629881 g_6
10 Close 52627629881 g_6
11 Open 62747029881 g_7
12 Close 62747029881 g_7
13 Open 2499619881 g_8
14 Close 2499619881 g_8
15 Open 32975019881 g_9
16 Close 32975019881 g_9
17 Open 42975119881 g_10
18 Cancel 42975119881 g_10
19 Open 53560019881 g_11
20 Open 53560019881 g_11
21 Open 62521619881 g_12
22 Close 62521619881 g_12
23 Open 72663719881 g_13
24 Close 72663719881 g_13
25 Open 82663819881 g_14
26 Close 82663819881 g_14
27 Open 92747019881 g_15
28 Open 92747019881 g_15
29 Open 1499629881 g_15
30 Close 1499629881 g_15
I would like to loop through each group (ex: g_1
,g_2
) and subset the rows if the order is either "Open" "Close" or "Open" "Cancel" any other order should be ignored.
for example g_2 should subset
type unique_id group
1 Open 11468329881 g_2
2 Close 11468329881 g_2
and g_11
should be ignored since the order is "Open" "Open"
g_15
should subset
type unique_id group
29 Open 1499629881 g_15
30 Close 1499629881 g_15
Any help would be appreciated.
Edit: I apologize if I wasn't clear earlier. The solution proposed is not working for g_8 for the sample given below
Open 21921312463 g_1
Close 21921312463 g_1
Open 31032312463 g_2
Close 31032312463 g_2
Open 41032212463 g_3
Close 41032212463 g_3
Open 51032312463 g_4
Close 51032312463 g_4
Open 61032212463 g_5
Close 61032212463 g_5
Open 71032312463 g_6
Close 71032312463 g_6
Open 81032212463 g_7
Close 81032212463 g_7
Open 21921312463 g_8
Open 21921312463 g_8
Close 21921312463 g_8
Open 31032312463 g_9
Close 31032312463 g_9
Open 41032212463 g_10
Close 41032212463 g_10
Open 51032312463 g_11
Close 51032312463 g_11
Open 61032212463 g_12
Close 61032212463 g_12
Open 71032312463 g_13
Close 71032312463 g_13
Open 81032212463 g_14
Close 81032212463 g_14
I'd like g_8 to be filtered to give
Open 21921312463 g_8
Close 21921312463 g_8
and ignore the first row in the group
Upvotes: 3
Views: 81
Reputation: 887501
After grouping by 'group', filter
the rows by checking if all
the elements in vector
(c("Open", "Close")
) or |
in the other vector (c("Open", "Cancel")
) are present %in%
the 'type' column
library(dplyr)
df1 %>%
group_by(group) %>%
#group_by(group, unique_id) %>%
filter(all(c("Open", "Close") %in% type)| all(c("Open", "Cancel") %in% type))
If the grouping variable includes 'unique_id', update the group_by
line with group_by(group, unique_id)
Based on the updated dataset and the new logic, we check the next value to see if it is 'Close' or 'Cancel'
df2 %>%
group_by(group, unique_id) %>%
mutate(ind = which(type == "Open" & lead(type) %in% c("Close", "Cancel"))[1]) %>%
filter(!is.na(ind)) %>%
slice(ind[1]:(ind[1]+1)) %>%
select(-ind)
Upvotes: 4