jonnyblue8
jonnyblue8

Reputation: 109

Subsetting consecutive pairs of rows meeting string conditions

I have a data frame with a 3 columns and several hundred rows. A particular column contains one of three strings: "Open","Close","Cancel"

    type    unique_id   group
1   Open    11468329881 g_2
2   Close   11468329881 g_2
3   Open    23254429881 g_3
4   Cancel  23254429881 g_3
5   Open    32550829881 g_4
6   Close   32550829881 g_4
7   Open    43254429881 g_5
8   Close   43254429881 g_5
9   Open    52627629881 g_6
10  Close   52627629881 g_6
11  Open    62747029881 g_7
12  Close   62747029881 g_7
13  Open    2499619881  g_8
14  Close   2499619881  g_8
15  Open    32975019881 g_9
16  Close   32975019881 g_9
17  Open    42975119881 g_10
18  Cancel  42975119881 g_10
19  Open    53560019881 g_11
20  Open    53560019881 g_11
21  Open    62521619881 g_12
22  Close   62521619881 g_12
23  Open    72663719881 g_13
24  Close   72663719881 g_13
25  Open    82663819881 g_14
26  Close   82663819881 g_14
27  Open    92747019881 g_15
28  Open    92747019881 g_15
29  Open    1499629881  g_15
30  Close   1499629881  g_15

I would like to loop through each group (ex: g_1,g_2) and subset the rows if the order is either "Open" "Close" or "Open" "Cancel" any other order should be ignored.

for example g_2 should subset

    type    unique_id   group
1   Open    11468329881 g_2
2   Close   11468329881 g_2

and g_11 should be ignored since the order is "Open" "Open"

g_15 should subset

    type    unique_id   group
29  Open    1499629881  g_15
30  Close   1499629881  g_15

Any help would be appreciated.

Edit: I apologize if I wasn't clear earlier. The solution proposed is not working for g_8 for the sample given below

Open    21921312463 g_1
Close   21921312463 g_1
Open    31032312463 g_2
Close   31032312463 g_2
Open    41032212463 g_3
Close   41032212463 g_3
Open    51032312463 g_4
Close   51032312463 g_4
Open    61032212463 g_5
Close   61032212463 g_5
Open    71032312463 g_6
Close   71032312463 g_6
Open    81032212463 g_7
Close   81032212463 g_7
Open    21921312463 g_8
Open    21921312463 g_8
Close   21921312463 g_8
Open    31032312463 g_9
Close   31032312463 g_9
Open    41032212463 g_10
Close   41032212463 g_10
Open    51032312463 g_11
Close   51032312463 g_11
Open    61032212463 g_12
Close   61032212463 g_12
Open    71032312463 g_13
Close   71032312463 g_13
Open    81032212463 g_14
Close   81032212463 g_14

I'd like g_8 to be filtered to give

Open    21921312463 g_8
Close   21921312463 g_8

and ignore the first row in the group

Upvotes: 3

Views: 81

Answers (1)

akrun
akrun

Reputation: 887501

After grouping by 'group', filter the rows by checking if all the elements in vector (c("Open", "Close")) or | in the other vector (c("Open", "Cancel")) are present %in% the 'type' column

library(dplyr)
df1 %>% 
  group_by(group) %>% 
  #group_by(group, unique_id) %>%
  filter(all(c("Open", "Close") %in% type)| all(c("Open", "Cancel") %in% type))

If the grouping variable includes 'unique_id', update the group_by line with group_by(group, unique_id)

Update

Based on the updated dataset and the new logic, we check the next value to see if it is 'Close' or 'Cancel'

df2 %>% 
  group_by(group, unique_id) %>%
   mutate(ind = which(type == "Open" & lead(type) %in% c("Close", "Cancel"))[1]) %>% 
   filter(!is.na(ind)) %>% 
   slice(ind[1]:(ind[1]+1)) %>% 
   select(-ind)

Upvotes: 4

Related Questions