Reputation: 11

Create variable measuring change over time

I have three variables, the first one measuring the intended vote choice one year before the election, the second and third one measuring the effective vote choice (not all people were not asked at the same time, therefore there are two variables measuring the effective vote choice). I want to find out whether the choice stayed the same or changed. How can I do this for all the observations in one go?

id  V1  V2  V3
1   50  NA  50  
2   20  NA  50
3   30  NA  20
4   30  NA  30
5   20  20  NA
6   40  NA  NA
7   50  NA  10
8   10  NA  10
9   40  NA  50
10  50  NA  NA

so I want to find out whether there is a difference between V1 and V2/V3. I thought of merging V2 and V3 first, but I am completely unsure. In the end, it should look like this (1 if there is a change, 0 if there is no change):

Upvotes: 1

Answers (2)

mac

Reputation: 3554

I'd approach this just the way you suggested: first combine the V2 and V3 variables using some suitable logic (maybe you need to consider a case where both V2 and V3 are present), then simply compare the V1 values to this new single measure of their effective vote.

with dplyr

d <- tribble(~id,  ~V1,  ~V2,  ~V3,
1,   50,  NA,  50,  
2,   20,  NA,  50,
3,   30,  NA,  20,
4,   30,  NA,  30,
5,   20,  20,  NA,
6,   40,  NA,  NA,
7,   50,  NA,  10,
8,   10,  NA,  10,
9,   40,  NA,  50,
10,  50,  NA,  NA,
)

d %>%
  mutate(effective_vote = if_else(is.na(V2), V3, V2),
         change = if_else(!V1==effective_vote, 1, 0))

gives

# A tibble: 10 x 6
      id    V1    V2    V3 effective_vote change
   <dbl> <dbl> <dbl> <dbl>          <dbl>  <dbl>
 1     1    50    NA    50             50      0
 2     2    20    NA    50             50      1
 3     3    30    NA    20             20      1
 4     4    30    NA    30             30      0
 5     5    20    20    NA             20      0
 6     6    40    NA    NA             NA     NA
 7     7    50    NA    10             10      1
 8     8    10    NA    10             10      0
 9     9    40    NA    50             50      1
10    10    50    NA    NA             NA     NA

effective_vote = if_else(is.na(V2), V3, V2) says, in effect, use V2 as the effective vote, if we have V2---otherwise use V3. This will handle the edge cases where both V2 and V3 are missing, as well as if both V2 and V3 are present (I chose to use V2 in the "both present" case, since presumably V2 was collected closer in time to the election, so it might be a better representation of their "true" vote than V3, collected later--but you could make a different choice).

Upvotes: 0

J.Red

Reputation: 51

a solution with dplyr

dt <- data.frame(V1=c(50,20,30,30,20,40,50,10,10,50),
                 V2=c(NA,NA,NA,NA,20,NA,NA,NA,NA,NA),
                 V3=c(50,50,20,30,NA,NA,10,10,50,NA))
dt %>%    
  mutate(V4= coalesce(V2, V3))  %>% 
  mutate(change = case_when(V1 == V4 ~0,
                            V1 != V4 ~1)) %>% 
  select(change)

Result

Upvotes: 0

Create variable measuring change over time

Answers (2)

Related Questions