Reputation: 754
Dear data manipulation wizards,
I am trying to summarize my dataframe in a way that makes reference to row numbers, but I can't find a way to do it in a tidy way. My dataframe looks like this.
# Sample data frame.
df <- data.frame(value = c(1,2,1,1,2,4,5,3,2))
value
1 1
2 2
3 1
4 1
5 2
6 4
7 5
8 3
9 2
I need to create a column, which says TRUE
if the corresponding number in value
as well as the numbers in next 4 consecutive rows are all larger than or equal to 2. The resulting dataframe should look like this:
value largerThan
1 1 FALSE
2 2 FALSE
3 1 FALSE
4 1 FALSE
5 2 TRUE
6 4 NA
7 5 NA
8 3 NA
9 2 NA
Note the four NA
in the last four rows of largerThan
. This is because these these rows don't have 4 consecutive rows after them, so they can't be evaluated. This is what is
tripping me up, together with the fact that I don't know how to make reference to row numbers when using tidyverse
syntax. This was more straightforward with for loops
, but I can't think of equivalents.
Suggestions are much appreciated! Especially if you have tidyverse
or dplyr
solutions, since these are the packages used in the rest of the code.
Upvotes: 0
Views: 197
Reputation: 26218
Do this
df$largerthan <- zoo::rollsum(df$value >= 2, 5, na.pad = T, align = "left") >=5
value largerthan
1 1 FALSE
2 2 FALSE
3 1 FALSE
4 1 FALSE
5 2 TRUE
6 4 NA
7 5 NA
8 3 NA
9 2 NA
Following the logic, a baseR way
df$largerthan <- rowSums(cbind(df >=2, c(df[-1,], NA) >=2, c(df[-c(1:2), ], NA, NA) >=2, c(df[-c(1:3),], NA, NA, NA) >=2, c(df[-c(1:4),], NA, NA, NA, NA)>=2)) >=5
If you want to do it in dplyr
df %>% mutate(largerThan = (value >=2 & lead(value) >=2 & lead(value,2) >= 2 & lead(value, 3) >= 2 & lead(value,4) >= 2))
Upvotes: 1
Reputation: 527
Here is a dpylr
version, however it is less elegant than the other solutions using rollsums. dpylr
has the functions lead()
and lag()
. The solution would look the following way:
df <- data.frame(value = c(1,2,1,1,2,4,5,3,2))
df = df %>% mutate(
largerThan = value >= 2 & lead(value) >= 2 & lead(value, 2) >= 2 &
lead(value, 3) >= 2 & lead(value, 4) >= 2
)
As more leads you need, as easier it gets with alternative solutions.
Upvotes: 1
Reputation: 27732
sorry, not familiar enough with the tidyverse.. here is a possible data.table
solution. Perhaps you can transelate it to tidyverse
library( data.table )
setDT(df)
df[, largerThan := frollsum( value >= 2, n = 5, align = "left" ) > 4 ]
# value largerThan
# 1: 1 FALSE
# 2: 2 FALSE
# 3: 1 FALSE
# 4: 1 FALSE
# 5: 2 TRUE
# 6: 4 NA
# 7: 5 NA
# 8: 3 NA
# 9: 2 NA
Upvotes: 1