Reputation: 77
I have a dataset like this,
x time
1 TRUE 9
2 TRUE 8
3 TRUE 10
4 TRUE 5
5 TRUE 16
6 FALSE 2
7 FALSE 17
8 FALSE 6
9 TRUE 11
10 TRUE 7
11 TRUE 20
12 TRUE 3
13 TRUE 10
14 FALSE 4
15 FALSE 2
16 FALSE 10
17 TRUE 3
18 TRUE 6
Using r, I would like to generate a new variable to assign a unique number for the changes of condition based on x and time. Specifically, I would like to search the data from the beginning and assign a number (e.g.,1) for the first row. When the condition of x change between TRUE and False, the number will increase. Within the "x is False" condition, the number will remain the same. However, within the "x is TRUE" condition, the number remain the same when "x is TRUE and time < 10", but the number will also be added 1, when meet the condition that "x is TRUE and time >= 10", and remain the same until meeting the next change condition.
Saying differently, change between TRUE and False is considered as condition change. In addition, when "x is TRUE", every time when time > 10 is also considered as the start of a new condition.
The output I would like to get is like this.
x time count
1 TRUE 9 1
2 TRUE 8 1
3 TRUE 10 2
4 TRUE 5 2
5 TRUE 16 3
6 FALSE 2 4
7 FALSE 17 4
8 FALSE 6 4
9 TRUE 11 5
10 TRUE 7 5
11 TRUE 20 6
12 TRUE 3 6
13 TRUE 9 6
14 FALSE 4 7
15 FALSE 2 7
16 FALSE 10 7
17 TRUE 3 8
18 TRUE 6 8
19 TRUE 15 9
I tried rleid(x) but it certainly did not take the change in time variable into account. I will appreciate any advice on how to solve this in r!
Upvotes: 3
Views: 495
Reputation: 2944
You can use for
loop in base R.
# Your data, copied from @akrun
df1 <- structure(list(x = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE,
FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE,
TRUE, TRUE), time = c(9L, 8L, 10L, 5L, 16L, 2L, 17L, 6L, 11L,
7L, 20L, 3L, 9L, 4L, 2L, 10L, 3L, 6L, 15L)), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19"), class = "data.frame")
# Create an empty `count` column
df1$count <- 0
# Assign 1 to the first row
df1$count[1] <- 1
# From the 2nd row up to the last row, increase the count number if
# one two #conditions is satisfied. Otherwise, the count number should
# remain unchanged.
for (k in 2:nrow(df1)) {
# The two conditions for increase of count number:
# (1)there is a change in x OR (2) x is TRUE and time >=10
if( df1$x[k] != df1$x[k-1] | (df1$x[k] == TRUE & df1$time[k] >= 10)){
df1$count[k] <- df1$count[k-1] + 1
}
else df1$count[k] <- df1$count[k-1]
}
df1
x time count
1 TRUE 9 1
2 TRUE 8 1
3 TRUE 10 2
4 TRUE 5 2
5 TRUE 16 3
6 FALSE 2 4
7 FALSE 17 4
8 FALSE 6 4
9 TRUE 11 5
10 TRUE 7 5
11 TRUE 20 6
12 TRUE 3 6
13 TRUE 9 6
14 FALSE 4 7
15 FALSE 2 7
16 FALSE 10 7
17 TRUE 3 8
18 TRUE 6 8
19 TRUE 15 9
Upvotes: 0
Reputation: 887128
Here is one option with rleid
- use rleid
on the column 'x' and the numeric index created based on the 'time' column
library(data.table)
setDT(df1)[, count := rleid(x, replace(x, x, cumsum(time[x] >= 10)))]
-output
x time count
<lgcl> <int> <int>
1: TRUE 9 1
2: TRUE 8 1
3: TRUE 10 2
4: TRUE 5 2
5: TRUE 16 3
6: FALSE 2 4
7: FALSE 17 4
8: FALSE 6 4
9: TRUE 11 5
10: TRUE 7 5
11: TRUE 20 6
12: TRUE 3 6
13: TRUE 9 6
14: FALSE 4 7
15: FALSE 2 7
16: FALSE 10 7
17: TRUE 3 8
18: TRUE 6 8
19: TRUE 15 9
Or with dplyr
library(dplyr)
df1 %>%
mutate(count = rleid(x, replace(x, x, cumsum(time[x] >= 10))))
-output
x time count
1 TRUE 9 1
2 TRUE 8 1
3 TRUE 10 2
4 TRUE 5 2
5 TRUE 16 3
6 FALSE 2 4
7 FALSE 17 4
8 FALSE 6 4
9 TRUE 11 5
10 TRUE 7 5
11 TRUE 20 6
12 TRUE 3 6
13 TRUE 9 6
14 FALSE 4 7
15 FALSE 2 7
16 FALSE 10 7
17 TRUE 3 8
18 TRUE 6 8
19 TRUE 15 9
df1 <- structure(list(x = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE,
FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE,
TRUE, TRUE), time = c(9L, 8L, 10L, 5L, 16L, 2L, 17L, 6L, 11L,
7L, 20L, 3L, 9L, 4L, 2L, 10L, 3L, 6L, 15L)), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19"), class = "data.frame")
Upvotes: 1