Reputation: 463
I am currently using data.table in R and I have a data set like the following:
ID mon age
1 1 22
1 2 56
1 5 106
2 1 34
2 3 65
2 4 76
I would like to create a variable called diff that calculates the difference in age within each ID's observations only if the mon variable is incrementing by 1. If it's not incrementing by 1 then I'd like diff to equal NA.
This is what I'd like the data set to look like:
ID mon age diff
1 1 22 NA
1 2 56 34
1 5 106 NA
2 1 34 NA
2 3 65 NA
2 4 76 11
I know this would need to be some type of if-else statement, but I'm not sure how utilize an if-else statement to iterate through each observation and check if the mon variable is incrementing by only 1. Any insight would be greatly appreciated.
Upvotes: 1
Views: 135
Reputation: 887128
We can group by 'ID', take the diff
erence of adjacent elements of 'age', and multiply with a logical vector created with diff
off 'mon' changed to NA
so that those places with more than 1 difference becomes NA
library(dplyr)
df1 %>%
group_by(ID) %>%
mutate(diff = c(NA, diff(age)) * c(NA, NA^(diff(mon) != 1)))
# A tibble: 6 x 4
# Groups: ID [2]
# ID mon age diff
# <int> <int> <int> <dbl>
#1 1 1 22 NA
#2 1 2 56 34
#3 1 5 106 NA
#4 2 1 34 NA
#5 2 3 65 NA
#6 2 4 76 11
Upvotes: 3
Reputation: 388982
You can use shift
to get the previous value of mon
and check if the difference is 1.
library(data.table)
df[, diff:= ifelse(mon - shift(mon) == 1, age - shift(age), NA), ID]
df
# ID mon age diff
#1: 1 1 22 NA
#2: 1 2 56 34
#3: 1 5 106 NA
#4: 2 1 34 NA
#5: 2 3 65 NA
#6: 2 4 76 11
Or similarly in dplyr
we can use lag
library(dplyr)
df %>%
group_by(ID) %>%
mutate(diff = if_else(mon - lag(mon) == 1, age- lag(age), NA_integer_))
Upvotes: 2