Sharif Amlani
Sharif Amlani

Reputation: 1288

Coding Changes in Variables in R

I am working with data that looks like this:

   ID Year Variable_of_Interest
1   a 2000                   0
2   a 2001                   0
3   a 2002                   0
4   a 2003                   0
5   a 2004                   0
6   a 2005                   1
7   a 2006                   1
8   a 2007                   1
9   a 2008                   1
10  a 2009                   1
11  b 2000                   0
12  b 2001                   0
13  b 2002                   0
14  b 2003                   1
15  b 2004                   1
16  b 2005                   1
17  b 2006                   1
18  b 2007                   1
19  b 2008                   1
20  b 2009                   1
21  c 2000                   0
22  c 2001                   0
23  c 2002                   0
24  c 2003                   0
25  c 2004                   0
26  c 2005                   0
27  c 2006                   1
28  c 2007                   1
29  c 2008                   1
30  c 2009                   1
31  d 2000                   0
32  d 2001                   0
33  d 2002                   1
34  d 2003                   1
35  d 2004                   1
36  d 2005                   1
37  d 2006                   0
38  d 2007                   0
39  d 2008                   0
40  d 2009                   0

The unit of analysis is ID. The IDs repeat across each year in the data. The variable of interest column represents changes to the IDs, wherein some years they are a 0 and other years they are a 1

I want to create an additional column that codes changes (defined as going from 0 to 1) in the Variable_of_Interest at the year before and after the change, while also ignoring changes from (1 to 0) (as seen when the ID is equal to "d").

Any code that can help me achieve this solution would be greatly appreciated!

Perferability I'd like the data to look like this:

   ID Year Variable_of_Interest Solution
1   a 2000                   0       -5
2   a 2001                   0       -4
3   a 2002                   0       -3
4   a 2003                   0       -2
5   a 2004                   0       -1
6   a 2005                   1        0
7   a 2006                   1        1
8   a 2007                   1        2
9   a 2008                   1        3
10  a 2009                   1        4
11  b 2000                   0       -3
12  b 2001                   0       -2
13  b 2002                   0       -1
14  b 2003                   1        0
15  b 2004                   1        1
16  b 2005                   1        2
17  b 2006                   1        3
18  b 2007                   1        4
19  b 2008                   1        5
20  b 2009                   1        6
21  c 2000                   0       -6
22  c 2001                   0       -5
23  c 2002                   0       -4
24  c 2003                   0       -3
25  c 2004                   0       -2
26  c 2005                   0       -1
27  c 2006                   1        0
28  c 2007                   1        1
29  c 2008                   1        2
30  c 2009                   1        3
31  d 2000                   0       -2
32  d 2001                   0       -1
33  d 2002                   1        0
34  d 2003                   1        1
35  d 2004                   1        2
36  d 2005                   1        3
37  d 2006                   0       NA
38  d 2007                   0       NA
39  d 2008                   0       NA
40  d 2009                   0       NA

Here is the replication code:

ID <- c(rep("a",10), rep("b", 10), rep("c", 10), rep("d", 10)); length(ID)
Year <- rep(seq(2000,2009, 1), 4)
Variable_of_Interest <- c(rep(0,5), rep(1, 5), 
                         rep(0,3), rep(1, 7), 
                         rep(0,6), rep(1, 4),
                         rep(0,2), rep(1, 4), rep(0,4))


data.frame(ID, Year, Variable_of_Interest)

Thank you for your help!

Upvotes: 0

Views: 46

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389155

We could create a function :

library(dplyr)

get_sequence <- function(x) {
  inds <- which(x == 1 & lag(x) == 0)[1]
  vals <- seq_along(x) - inds
  inds <- which(x == 0 & lag(x) == 1)[1]
  if(!is.na(inds))  vals[inds:length(x)] <- NA
  return(vals)
}

and apply it for each ID :

df %>% group_by(ID) %>% mutate(Solution = get_sequence(Variable_of_Interest)) 

#   ID Year Variable_of_Interest Solution
#1   a 2000                    0       -5
#2   a 2001                    0       -4
#3   a 2002                    0       -3
#4   a 2003                    0       -2
#5   a 2004                    0       -1
#6   a 2005                    1        0
#7   a 2006                    1        1
#8   a 2007                    1        2
#9   a 2008                    1        3
#10  a 2009                    1        4
#11  b 2000                    0       -3
#...
#...
#33  d 2002                    1        0
#34  d 2003                    1        1
#35  d 2004                    1        2
#36  d 2005                    1        3
#37  d 2006                    0       NA
#38  d 2007                    0       NA
#39  d 2008                    0       NA
#40  d 2009                    0       NA

Upvotes: 1

Related Questions