Cina
Cina

Reputation: 10199

increment column records based on changes in other column in R

I would like to add 1 to session column when the difference between the first timestamp of the same session and followings timestamp records is more than 10 units.

in other words:
if the gap in timestamp column is more than 10 in the same session, add 1 to the rest of the sessions for a specific ID. So we shouldn't have the same session with the gap more than 10 in its records.
lets say:

df<-read.table(text="
ID      timestamp    session
1       10             1
1       12             1
1       15             1
1       21             1
1       25             1
1       27             2
1       29             2
2       11             1
2       22             2
2       27             2
2       32             2
2       42             2
2       43             3",header=T,stringsAsFactors = F)

In the example above, for ID==1 the session gap from the first record (timestamp==10) is more than 10 in row 4 (timestamp==21), so we add 1 to the rest of sessions. Whenever the session number change the difference of the first record of timestamp should be less than 10 in the same sassion, otherwise it should add to session.

result:  

ID      timestamp    session
1      *10             1
1       12             1
1       15             1
1      *21             2     <-- because 21-10 >= 10 it add 1 to the rest of sessions in this ID 
1       25             2
1       27             3
1       29             3
2       11             1
2      *22             2
2       27             2
2      *32             3     <-- because 32-22>= 10 it add 1 to the rest of session
2      *42             4     <-- because 42-32>=10
2       43             5

How can I do it in R?

Upvotes: 3

Views: 120

Answers (1)

Ben
Ben

Reputation: 30474

Perhaps a custom function might help that calculates a cumulative sum and resets once threshold reached. In this case, if you provide the function with the session data, it will provide a result that will include a cumulative "offset" for session, but only in rows when the session number did not increase. This addresses the case of ID 2 timestamp 22 where the difference > 10, but session number increased from 1 to 2.

library(tidyverse)

threshold <- 10

cumsum_with_reset <- function(x, session, threshold) {
  cumsum <- 0
  group <- 0
  result <- numeric()
  for (i in seq_along(x)) {
    cumsum <- cumsum + x[i]
    if (cumsum >= threshold) {
      if (session[i] == session[i-1]) {
        group <- group + 1
      }
      cumsum <- 0
    }
    result = c(result, group)
  }
  return (result)
}

df %>%
  group_by(ID) %>%
  mutate(diff = c(0, diff(timestamp)),
         cumdiff = cumsum_with_reset(diff, session, threshold),
         new_session = cumdiff + session)

Function adapted from this solution.

Output

      ID timestamp session  diff cumdiff new_session
   <int>     <int>   <int> <dbl>   <dbl>       <dbl>
 1     1        10       1     0       0           1
 2     1        12       1     2       0           1
 3     1        15       1     3       0           1
 4     1        21       1     6       1           2
 5     1        25       1     4       1           2
 6     1        27       2     2       1           3
 7     1        29       2     2       1           3
 8     2        11       1     0       0           1
 9     2        22       2    11       0           2
10     2        27       2     5       0           2
11     2        32       2     5       1           3
12     2        42       2    10       2           4
13     2        43       3     1       2           5

Upvotes: 1

Related Questions