Reputation: 125
I have the following data frame and would like to create a new variable Var3 based on values from Var2, year, and month. My data is in long format, thus every person (ID) has multiple rows (i.e., one row per measurement occasion). For the creation of the new variable, therefore, the data has to be grouped by ID, I guess.
dd <- read.table(text=
"ID Var1 Year Month Var2
1 0 2008 2 -4.17
1 0 2009 2 -3.17
1 0 2010 2 -2.17
1 0 2011 2 -1.17
1 1 2013 2 0.17
1 0 2014 10 2.84
2 0 2008 2 -3.33
2 0 2010 2 -1.33
2 1 2011 2 0.33
2 0 2013 2 5.43
2 0 2015 11 6.43
3 1 2010 2 0.85
3 0 2011 2 2.33
3 0 2013 2 4.86
3 0 2015 11 6.33
3 0 2017 10 4.12", header=TRUE)
The new variable should be built following specific rules:
For the first row in which Var2 >= 0, take value of Var2 in the row with Var1 == 1 (e.g. 0.17 for ID 1), add the result of this equation: the value of year from the first row below Var1 == 1 (e.g., 2014 for ID 1) minus the value of year from the row with Var1 == 1 (e.g., 2013 for ID 1) Further add the result of this equation: the value of month from the first row below Var1 == 1 (e.g., 10 for ID 1) minus the value of month from the row with Var1 == 1 (e.g., 2 for ID 1) divided by 12.
For the second row in which Var2 >= 0, take value of Var2 in the row with Var1 == 1 (e.g. 0.33 for ID 2), add the result of this equation: the value of year from the second row below Var1 == 1 (e.g., 2015 for ID 2) minus the value of year from the row with Var1 == 1 (e.g., 2011 for ID 2) Further add the result of this equation: the value of month from the second row below Var1 == 1 (e.g., 2 for ID 2) minus the value of month from the row with Var1 == 1 (e.g., 2 for ID 2) divided by 12.
...and so on for the third row in which Var2 >= 0 etcetera...
The final data frame should look like this:
dd_new <- read.table(text=
"ID Var1 Year Month Var2 Var3
1 0 2008 2 -4.17 -4.17
1 0 2009 2 -3.17 -3.17
1 0 2010 2 -2.17 -2.17
1 0 2011 2 -1.17 -1.17
1 1 2013 2 0.17 0.17
1 0 2014 10 2.89 1.836667
2 0 2008 2 -3.33 -3.33
2 0 2010 2 -1.33 -1.33
2 1 2011 2 0.33 0.33
2 0 2013 2 5.43 2.33
2 0 2015 11 6.43 5.08
3 1 2010 2 0.67 0.67
3 0 2011 2 2.33 1.67
3 0 2013 2 4.86 3.67
3 0 2015 11 6.33 6.42
3 0 2017 10 4.12 8.336667", header=TRUE)
Unfortunately, I do not know how to create a new variable using values from rows below a certain row (i.e. below the row in which Var1 == 1). Can anybody help me out?
Thanks a lot!
Upvotes: 1
Views: 1619
Reputation: 9277
This is how you can create Var3
following the updated specific rules
library(dplyr)
dd %>%
group_by(ID) %>%
mutate(
Var3 = ifelse(Var1 == 1 | Var2 < 0, Var2, Var2[Var1 == 1] + Year[row_number()] - Year[Var1 == 1] + (Month[row_number()] - Month[Var1 == 1])/12)
)
Output
# A tibble: 16 x 6
# Groups: ID [3]
ID Var1 Year Month Var2 Var3
<int> <int> <int> <int> <dbl> <dbl>
1 1 0 2008 2 -4.17 -4.17
2 1 0 2009 2 -3.17 -3.17
3 1 0 2010 2 -2.17 -2.17
4 1 0 2011 2 -1.17 -1.17
5 1 1 2013 2 0.17 0.17
6 1 0 2014 10 2.89 1.84
7 2 0 2008 2 -3.33 -3.33
8 2 0 2010 2 -1.33 -1.33
9 2 1 2011 2 0.33 0.33
10 2 0 2013 2 5.43 2.33
11 2 0 2015 11 6.43 5.08
12 3 1 2010 2 0.67 0.67
13 3 0 2011 2 2.33 1.67
14 3 0 2013 2 4.86 3.67
15 3 0 2015 11 6.33 6.42
16 3 0 2017 10 4.12 8.34
Upvotes: 1