Lynn
Lynn

Reputation: 77

Create sequence counter based on condition

I have a dataset like this,

       x time 
1   TRUE    9       
2   TRUE    8       
3   TRUE   10      
4   TRUE    5       
5   TRUE   16       
6  FALSE    2       
7  FALSE   17       
8  FALSE    6   
9   TRUE   11       
10  TRUE    7       
11  TRUE   20       
12  TRUE    3       
13  TRUE   10       
14 FALSE    4       
15 FALSE    2       
16 FALSE   10  
17  TRUE    3       
18  TRUE    6 

Using r, I would like to generate a new variable to assign a unique number for the changes of condition based on x and time. Specifically, I would like to search the data from the beginning and assign a number (e.g.,1) for the first row. When the condition of x change between TRUE and False, the number will increase. Within the "x is False" condition, the number will remain the same. However, within the "x is TRUE" condition, the number remain the same when "x is TRUE and time < 10", but the number will also be added 1, when meet the condition that "x is TRUE and time >= 10", and remain the same until meeting the next change condition.

Saying differently, change between TRUE and False is considered as condition change. In addition, when "x is TRUE", every time when time > 10 is also considered as the start of a new condition.

The output I would like to get is like this.

       x time   count
1   TRUE   9       1
2   TRUE   8       1
3   TRUE   10      2
4   TRUE   5       2
5   TRUE   16      3
6  FALSE    2      4 
7  FALSE   17      4 
8  FALSE    6      4
9   TRUE   11      5 
10  TRUE    7      5 
11  TRUE   20      6 
12  TRUE    3      6 
13  TRUE    9      6 
14 FALSE    4      7 
15 FALSE    2      7 
16 FALSE   10      7
17  TRUE    3      8 
18  TRUE    6      8
19  TRUE    15     9

I tried rleid(x) but it certainly did not take the change in time variable into account. I will appreciate any advice on how to solve this in r!

Upvotes: 3

Views: 495

Answers (2)

Abdur Rohman
Abdur Rohman

Reputation: 2944

You can use for loop in base R.

# Your data, copied from @akrun
df1 <- structure(list(x = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, 
                            FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, 
                            TRUE, TRUE), time = c(9L, 8L, 10L, 5L, 16L, 2L, 17L, 6L, 11L, 
                                                  7L, 20L, 3L, 9L, 4L, 2L, 10L, 3L, 6L, 15L)), row.names = c("1", 
                                                                                                             "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
                                                                                                             "14", "15", "16", "17", "18", "19"), class = "data.frame")

# Create an empty `count` column 

df1$count <- 0

# Assign 1 to the first row 

df1$count[1] <- 1

# From the 2nd row up to the last row, increase the count number if 
# one two #conditions is satisfied. Otherwise, the count number should 
# remain unchanged.

for (k in 2:nrow(df1)) {
      # The two conditions for increase of count number:
      # (1)there is a change in x    OR     (2) x is TRUE and time >=10

  if( df1$x[k] != df1$x[k-1] | (df1$x[k] == TRUE & df1$time[k] >= 10)){
    df1$count[k] <- df1$count[k-1] + 1
  }
  else df1$count[k] <- df1$count[k-1]
}

df1
       x time count
1   TRUE    9     1
2   TRUE    8     1
3   TRUE   10     2
4   TRUE    5     2
5   TRUE   16     3
6  FALSE    2     4
7  FALSE   17     4
8  FALSE    6     4
9   TRUE   11     5
10  TRUE    7     5
11  TRUE   20     6
12  TRUE    3     6
13  TRUE    9     6
14 FALSE    4     7
15 FALSE    2     7
16 FALSE   10     7
17  TRUE    3     8
18  TRUE    6     8
19  TRUE   15     9

Upvotes: 0

akrun
akrun

Reputation: 887128

Here is one option with rleid - use rleid on the column 'x' and the numeric index created based on the 'time' column

library(data.table)
setDT(df1)[, count := rleid(x, replace(x, x, cumsum(time[x] >= 10)))]

-output

        x  time count
    <lgcl> <int> <int>
 1:   TRUE     9     1
 2:   TRUE     8     1
 3:   TRUE    10     2
 4:   TRUE     5     2
 5:   TRUE    16     3
 6:  FALSE     2     4
 7:  FALSE    17     4
 8:  FALSE     6     4
 9:   TRUE    11     5
10:   TRUE     7     5
11:   TRUE    20     6
12:   TRUE     3     6
13:   TRUE     9     6
14:  FALSE     4     7
15:  FALSE     2     7
16:  FALSE    10     7
17:   TRUE     3     8
18:   TRUE     6     8
19:   TRUE    15     9

Or with dplyr

library(dplyr)
df1 %>% 
   mutate(count = rleid(x, replace(x, x, cumsum(time[x] >= 10))))

-output

       x time count
1   TRUE    9     1
2   TRUE    8     1
3   TRUE   10     2
4   TRUE    5     2
5   TRUE   16     3
6  FALSE    2     4
7  FALSE   17     4
8  FALSE    6     4
9   TRUE   11     5
10  TRUE    7     5
11  TRUE   20     6
12  TRUE    3     6
13  TRUE    9     6
14 FALSE    4     7
15 FALSE    2     7
16 FALSE   10     7
17  TRUE    3     8
18  TRUE    6     8
19  TRUE   15     9

data

df1 <- structure(list(x = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, 
FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, 
TRUE, TRUE), time = c(9L, 8L, 10L, 5L, 16L, 2L, 17L, 6L, 11L, 
7L, 20L, 3L, 9L, 4L, 2L, 10L, 3L, 6L, 15L)), row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19"), class = "data.frame")

Upvotes: 1

Related Questions