Sol
Sol

Reputation: 754

Summarise across rows by making reference to row numbers in R

Dear data manipulation wizards,

I am trying to summarize my dataframe in a way that makes reference to row numbers, but I can't find a way to do it in a tidy way. My dataframe looks like this.

# Sample data frame.
df <- data.frame(value = c(1,2,1,1,2,4,5,3,2))

  value
1     1
2     2
3     1
4     1
5     2
6     4
7     5
8     3
9     2

I need to create a column, which says TRUE if the corresponding number in value as well as the numbers in next 4 consecutive rows are all larger than or equal to 2. The resulting dataframe should look like this:

  value largerThan
1     1      FALSE
2     2      FALSE
3     1      FALSE
4     1      FALSE
5     2       TRUE
6     4         NA
7     5         NA
8     3         NA
9     2         NA

Note the four NA in the last four rows of largerThan. This is because these these rows don't have 4 consecutive rows after them, so they can't be evaluated. This is what is tripping me up, together with the fact that I don't know how to make reference to row numbers when using tidyverse syntax. This was more straightforward with for loops, but I can't think of equivalents.

Suggestions are much appreciated! Especially if you have tidyverse or dplyr solutions, since these are the packages used in the rest of the code.

Upvotes: 0

Views: 197

Answers (3)

AnilGoyal
AnilGoyal

Reputation: 26218

Do this

df$largerthan <- zoo::rollsum(df$value >= 2, 5, na.pad = T, align = "left") >=5

  value largerthan
1     1      FALSE
2     2      FALSE
3     1      FALSE
4     1      FALSE
5     2       TRUE
6     4         NA
7     5         NA
8     3         NA
9     2         NA

Following the logic, a baseR way

df$largerthan <- rowSums(cbind(df >=2, c(df[-1,], NA) >=2, c(df[-c(1:2), ], NA, NA) >=2, c(df[-c(1:3),], NA, NA, NA) >=2, c(df[-c(1:4),], NA, NA, NA, NA)>=2)) >=5

If you want to do it in dplyr

df %>% mutate(largerThan = (value >=2 & lead(value) >=2 & lead(value,2) >= 2 & lead(value, 3) >= 2 & lead(value,4) >= 2))

Upvotes: 1

Benjamin Schlegel
Benjamin Schlegel

Reputation: 527

Here is a dpylr version, however it is less elegant than the other solutions using rollsums. dpylr has the functions lead() and lag(). The solution would look the following way:

df <- data.frame(value = c(1,2,1,1,2,4,5,3,2))
df = df %>% mutate(
  largerThan = value >= 2 & lead(value) >= 2 & lead(value, 2) >= 2 &
    lead(value, 3) >= 2 & lead(value, 4) >= 2
)

As more leads you need, as easier it gets with alternative solutions.

Upvotes: 1

Wimpel
Wimpel

Reputation: 27732

sorry, not familiar enough with the tidyverse.. here is a possible data.table solution. Perhaps you can transelate it to tidyverse

library( data.table )
setDT(df)
df[, largerThan := frollsum( value >= 2, n = 5, align = "left" ) > 4 ]
#    value largerThan
# 1:     1      FALSE
# 2:     2      FALSE
# 3:     1      FALSE
# 4:     1      FALSE
# 5:     2       TRUE
# 6:     4         NA
# 7:     5         NA
# 8:     3         NA
# 9:     2         NA

Upvotes: 1

Related Questions