sumshyftw
sumshyftw

Reputation: 1131

Conditionally determining value of column by looking at last group

I have test data that looks like this:

   Group Value
1      a     1
2      a     2
3      a     3
4      a     4
5      b     5
6      b     2
7      b     3
8      c     6
9      c     7
10     c     8
11     c     3
12     c     6
13     d     9
14     d    10
15     e     9

I am trying to create a vectorized approach, preferably using tidyverse tools that will create an additional column that notes if the Value is present in the previous grouping. Here is an example of how this would look like:

   Group Value In_Last_Group
1      a     1         FALSE
2      a     2         FALSE
3      a     3         FALSE
4      a     4         FALSE
5      b     5         FALSE
6      b     2          TRUE
7      b     3          TRUE
8      c     6         FALSE
9      c     7         FALSE
10     c     8         FALSE
11     c     3          TRUE
12     c     5          TRUE
13     d     9         FALSE
14     d    10         FALSE
15     e     9          TRUE

I have a way to do this using a standard for loop, but I have a large dataset and I believe it would be much faster if it was vectorized. Any help would be appreciated.

Here is the dput of the test data:

structure(list(Group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 3L, 4L, 4L, 5L), .Label = c("a", "b", "c", "d", 
"e"), class = "factor"), Value = c(1, 2, 3, 4, 5, 2, 3, 6, 7, 
8, 3, 6, 9, 10, 9)), .Names = c("Group", "Value"), row.names = c(NA, 
-15L), class = "data.frame")

Upvotes: 6

Views: 63

Answers (2)

chinsoon12
chinsoon12

Reputation: 25225

You can use a join to lookup values in the previous group to see if those values exists. It should be faster than looping through the groups. I am not familiar with tidyverse but here is an implementation in data.table (which should also be faster than tidyverse if your data is large enough):

library(data.table)
setDT(DF)
DF[, c("g", "pg") := .(r <- rleid(Group), r - 1L)]
DF[, ilg := FALSE][DF, on=.(pg=g, Value), ilg := TRUE]

output (note that there is a typo for Value in row 12 of OP's desired output):

    Group Value g pg   ilg
 1:     a     1 1  0 FALSE
 2:     a     2 1  0 FALSE
 3:     a     3 1  0 FALSE
 4:     a     4 1  0 FALSE
 5:     b     5 2  1 FALSE
 6:     b     2 2  1  TRUE
 7:     b     3 2  1  TRUE
 8:     c     6 3  2 FALSE
 9:     c     7 3  2 FALSE
10:     c     8 3  2 FALSE
11:     c     3 3  2  TRUE
12:     c     6 3  2 FALSE
13:     d     9 4  3 FALSE
14:     d    10 4  3 FALSE
15:     e     9 5  4  TRUE

data:

DF <- structure(list(Group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 3L, 4L, 4L, 5L), .Label = c("a", "b", "c", "d", 
"e"), class = "factor"), Value = c(1, 2, 3, 4, 5, 2, 3, 6, 7, 
8, 3, 6, 9, 10, 9)), .Names = c("Group", "Value"), row.names = c(NA, 
-15L), class = "data.frame")

Upvotes: 2

akrun
akrun

Reputation: 886948

We can nest after grouping by 'Group', then remove the first and last elements of the 'data' column, use map2 to do a comparison of corresponding elements and then append with the FALSE elements for the first group

library(dplyr)
library(purrr)
df2 <- df1 %>%
         group_by(Group) %>%
         nest

flag <-  map2(df2$data[-1], df2$data[-nrow(df2)], ~ 
      .x$Value %in% .y$Value) %>%
      unlist
df1$Last_Group <- c(rep(FALSE, nrow(df2$data[[1]])), flag)

Upvotes: 4

Related Questions