bziggy
bziggy

Reputation: 463

R : creating a variable conditionally if another variable increments by one

I am currently using data.table in R and I have a data set like the following:

ID   mon   age
1    1     22
1    2     56
1    5     106
2    1     34
2    3     65
2    4     76

I would like to create a variable called diff that calculates the difference in age within each ID's observations only if the mon variable is incrementing by 1. If it's not incrementing by 1 then I'd like diff to equal NA.

This is what I'd like the data set to look like:

ID   mon   age   diff
1    1     22    NA
1    2     56    34
1    5     106   NA
2    1     34    NA
2    3     65    NA
2    4     76    11

I know this would need to be some type of if-else statement, but I'm not sure how utilize an if-else statement to iterate through each observation and check if the mon variable is incrementing by only 1. Any insight would be greatly appreciated.

Upvotes: 1

Views: 135

Answers (2)

akrun
akrun

Reputation: 887128

We can group by 'ID', take the difference of adjacent elements of 'age', and multiply with a logical vector created with diff off 'mon' changed to NA so that those places with more than 1 difference becomes NA

library(dplyr)
df1 %>% 
    group_by(ID) %>% 
     mutate(diff =  c(NA, diff(age)) * c(NA, NA^(diff(mon) != 1)))
# A tibble: 6 x 4
# Groups:   ID [2]
#     ID   mon   age  diff
#  <int> <int> <int> <dbl>
#1     1     1    22    NA
#2     1     2    56    34
#3     1     5   106    NA
#4     2     1    34    NA
#5     2     3    65    NA
#6     2     4    76    11

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 388982

You can use shift to get the previous value of mon and check if the difference is 1.

library(data.table)
df[, diff:= ifelse(mon - shift(mon) == 1, age - shift(age), NA), ID] 
df

#   ID mon age diff
#1:  1   1  22   NA
#2:  1   2  56   34
#3:  1   5 106   NA
#4:  2   1  34   NA
#5:  2   3  65   NA
#6:  2   4  76   11        

Or similarly in dplyr we can use lag

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(diff = if_else(mon - lag(mon) == 1, age- lag(age), NA_integer_))

Upvotes: 2

Related Questions