Reputation: 193
I have a question regarding data.table in R
i have a dataset like this
data <- data.table(a=c(1:7,12,32,13),b=c(1,5,6,7,8,3,2,5,1,4))
a b
1: 1 1
2: 2 5
3: 3 6
4: 4 7
5: 5 8
6: 6 3
7: 7 2
8: 12 5
9: 32 1
10: 13 4
Now i want to generate a third column c, which gonna compare the value of each row of a, to all previous values of b and check if there is any value of b is bigger than a. For e.g, at row 5, a=5, and previous value of b is 1,5,6,7. so 6 and 7 is bigger than 5, therefore value of c should be 1, otherwise it would be 0. The result should be like this
a b c
1: 1 1 NA
2: 2 5 0
3: 3 6 1
4: 4 7 1
5: 5 8 1
6: 6 3 1
7: 7 2 1
8: 12 5 0
9: 32 1 0
10: 13 4 0
I tried with a for loop but it takes a very long time. I also tried shift but i can not refer to multiple previous rows with shift. Anyone has any recommendation?
Upvotes: 5
Views: 743
Reputation: 5532
This is a base R solution (see the dplyr
solution below):
data$c = NA
data$c[2:nrow(data)] <- sapply(2:nrow(data), function(x) { data$c[x] <- any(data$a[x] < data$b[1:(x-1)]) } )
## a b c
## 1: 1 1 NA
## 2: 2 5 0
## 3: 3 6 1
## 4: 4 7 1
## 5: 5 8 1
## 6: 6 3 1
## 7: 7 2 1
## 8: 12 5 0
## 9: 32 1 0
## 10: 13 4 0
EDIT
Here is a simpler solution using dplyr
library(dplyr)
### Given the cumulative max and comparing to 'a', set see to 1/0.
data %>% mutate(c = ifelse(a < lag(cummax(b)), 1, 0))
## a b c
## 1 1 1 NA
## 2 2 5 0
## 3 3 6 1
## 4 4 7 1
## 5 5 8 1
## 6 6 3 1
## 7 7 2 1
## 8 12 5 0
## 9 32 1 0
## 10 13 4 0
### Using 'shift' with dplyr
data %>% mutate(c = ifelse(a <= shift(cummax(b)), 1, 0))
Upvotes: 2
Reputation: 18700
library(data.table)
data <- data.table(a=c(1:7,12,32,13),b=c(1,5,6,7,8,3,2,5,1,4))
data[,c:= a <= shift(cummax(b))]
Upvotes: 6