Reputation: 1288
I am working with data that looks like this:
ID Year Variable_of_Interest
1 a 2000 0
2 a 2001 0
3 a 2002 0
4 a 2003 0
5 a 2004 0
6 a 2005 1
7 a 2006 1
8 a 2007 1
9 a 2008 1
10 a 2009 1
11 b 2000 0
12 b 2001 0
13 b 2002 0
14 b 2003 1
15 b 2004 1
16 b 2005 1
17 b 2006 1
18 b 2007 1
19 b 2008 1
20 b 2009 1
21 c 2000 0
22 c 2001 0
23 c 2002 0
24 c 2003 0
25 c 2004 0
26 c 2005 0
27 c 2006 1
28 c 2007 1
29 c 2008 1
30 c 2009 1
31 d 2000 0
32 d 2001 0
33 d 2002 1
34 d 2003 1
35 d 2004 1
36 d 2005 1
37 d 2006 0
38 d 2007 0
39 d 2008 0
40 d 2009 0
The unit of analysis is ID. The IDs repeat across each year in the data. The variable of interest
column represents changes to the IDs, wherein some years they are a 0 and other years they are a 1
I want to create an additional column that codes changes (defined as going from 0 to 1) in the Variable_of_Interest
at the year before and after the change, while also ignoring changes from (1 to 0) (as seen when the ID is equal to "d").
Any code that can help me achieve this solution would be greatly appreciated!
Perferability I'd like the data to look like this:
ID Year Variable_of_Interest Solution
1 a 2000 0 -5
2 a 2001 0 -4
3 a 2002 0 -3
4 a 2003 0 -2
5 a 2004 0 -1
6 a 2005 1 0
7 a 2006 1 1
8 a 2007 1 2
9 a 2008 1 3
10 a 2009 1 4
11 b 2000 0 -3
12 b 2001 0 -2
13 b 2002 0 -1
14 b 2003 1 0
15 b 2004 1 1
16 b 2005 1 2
17 b 2006 1 3
18 b 2007 1 4
19 b 2008 1 5
20 b 2009 1 6
21 c 2000 0 -6
22 c 2001 0 -5
23 c 2002 0 -4
24 c 2003 0 -3
25 c 2004 0 -2
26 c 2005 0 -1
27 c 2006 1 0
28 c 2007 1 1
29 c 2008 1 2
30 c 2009 1 3
31 d 2000 0 -2
32 d 2001 0 -1
33 d 2002 1 0
34 d 2003 1 1
35 d 2004 1 2
36 d 2005 1 3
37 d 2006 0 NA
38 d 2007 0 NA
39 d 2008 0 NA
40 d 2009 0 NA
Here is the replication code:
ID <- c(rep("a",10), rep("b", 10), rep("c", 10), rep("d", 10)); length(ID)
Year <- rep(seq(2000,2009, 1), 4)
Variable_of_Interest <- c(rep(0,5), rep(1, 5),
rep(0,3), rep(1, 7),
rep(0,6), rep(1, 4),
rep(0,2), rep(1, 4), rep(0,4))
data.frame(ID, Year, Variable_of_Interest)
Thank you for your help!
Upvotes: 0
Views: 46
Reputation: 389155
We could create a function :
library(dplyr)
get_sequence <- function(x) {
inds <- which(x == 1 & lag(x) == 0)[1]
vals <- seq_along(x) - inds
inds <- which(x == 0 & lag(x) == 1)[1]
if(!is.na(inds)) vals[inds:length(x)] <- NA
return(vals)
}
and apply it for each ID
:
df %>% group_by(ID) %>% mutate(Solution = get_sequence(Variable_of_Interest))
# ID Year Variable_of_Interest Solution
#1 a 2000 0 -5
#2 a 2001 0 -4
#3 a 2002 0 -3
#4 a 2003 0 -2
#5 a 2004 0 -1
#6 a 2005 1 0
#7 a 2006 1 1
#8 a 2007 1 2
#9 a 2008 1 3
#10 a 2009 1 4
#11 b 2000 0 -3
#...
#...
#33 d 2002 1 0
#34 d 2003 1 1
#35 d 2004 1 2
#36 d 2005 1 3
#37 d 2006 0 NA
#38 d 2007 0 NA
#39 d 2008 0 NA
#40 d 2009 0 NA
Upvotes: 1