Reputation: 11
I have three variables, the first one measuring the intended vote choice one year before the election, the second and third one measuring the effective vote choice (not all people were not asked at the same time, therefore there are two variables measuring the effective vote choice). I want to find out whether the choice stayed the same or changed. How can I do this for all the observations in one go?
id V1 V2 V3
1 50 NA 50
2 20 NA 50
3 30 NA 20
4 30 NA 30
5 20 20 NA
6 40 NA NA
7 50 NA 10
8 10 NA 10
9 40 NA 50
10 50 NA NA
so I want to find out whether there is a difference between V1 and V2/V3. I thought of merging V2 and V3 first, but I am completely unsure. In the end, it should look like this (1 if there is a change, 0 if there is no change):
id change
1 0
2 0
3 1
4 0
5 0
6 NA
7 1
8 0
9 1
10 NA
Upvotes: 1
Views: 52
Reputation: 3554
I'd approach this just the way you suggested: first combine the V2
and V3
variables using some suitable logic (maybe you need to consider a case where both V2 and V3 are present), then simply compare the V1
values to this new single measure of their effective vote.
with dplyr
d <- tribble(~id, ~V1, ~V2, ~V3,
1, 50, NA, 50,
2, 20, NA, 50,
3, 30, NA, 20,
4, 30, NA, 30,
5, 20, 20, NA,
6, 40, NA, NA,
7, 50, NA, 10,
8, 10, NA, 10,
9, 40, NA, 50,
10, 50, NA, NA,
)
d %>%
mutate(effective_vote = if_else(is.na(V2), V3, V2),
change = if_else(!V1==effective_vote, 1, 0))
gives
# A tibble: 10 x 6
id V1 V2 V3 effective_vote change
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 50 NA 50 50 0
2 2 20 NA 50 50 1
3 3 30 NA 20 20 1
4 4 30 NA 30 30 0
5 5 20 20 NA 20 0
6 6 40 NA NA NA NA
7 7 50 NA 10 10 1
8 8 10 NA 10 10 0
9 9 40 NA 50 50 1
10 10 50 NA NA NA NA
effective_vote = if_else(is.na(V2), V3, V2)
says, in effect, use V2 as the effective vote, if we have V2---otherwise use V3. This will handle the edge cases where both V2
and V3
are missing, as well as if both V2
and V3
are present (I chose to use V2 in the "both present" case, since presumably V2 was collected closer in time to the election, so it might be a better representation of their "true" vote than V3, collected later--but you could make a different choice).
Upvotes: 0
Reputation: 51
a solution with dplyr
dt <- data.frame(V1=c(50,20,30,30,20,40,50,10,10,50),
V2=c(NA,NA,NA,NA,20,NA,NA,NA,NA,NA),
V3=c(50,50,20,30,NA,NA,10,10,50,NA))
dt %>%
mutate(V4= coalesce(V2, V3)) %>%
mutate(change = case_when(V1 == V4 ~0,
V1 != V4 ~1)) %>%
select(change)
Result
change
1 0
2 1
3 1
4 0
5 0
6 NA
7 1
8 0
9 1
10 NA
Upvotes: 0