Reputation: 984
I have vote percentages for two candidates for a number of districts. I would like to see whether the vote percentages are very close, where I define very close as a 0.5% difference. The goal is to get a new boolean column that outputs TRUE if the vote percentage was close in that district and FALSE if the vote percentage was not close.
MWE:
set.seed(1)
district <- rep(letters[1:5], each = 2)
candidate <- rep(LETTERS[1:2], 5)
vote_pct <- round(rnorm(10, 0.5, 0.01), 3)
my_df <- as.data.frame(cbind(district, candidate, vote_pct))
Thus the data looks like:
> my_df
district candidate vote_pct
1 a A 0.494
2 a B 0.502
3 b A 0.492
4 b B 0.516
5 c A 0.503
6 c B 0.492
7 d A 0.505
8 d B 0.507
9 e A 0.506
10 e B 0.497
In my dataset, the percentages for two candidates in a district would not exceed 100%.
What I would like is
district candidate vote_pct close
1 a A 0.494 FALSE
2 a B 0.502 FALSE
3 b A 0.492 FALSE
4 b B 0.516 FALSE
5 c A 0.503 FALSE
6 c B 0.492 FALSE
7 d A 0.505 TRUE
8 d B 0.507 TRUE
9 e A 0.506 FALSE
10 e B 0.497 FALSE
Or alternatively,
district vote_diff close
1 a -0.008 FALSE
...
In my dataset, there are many more columns in case that makes a difference.
Upvotes: 0
Views: 44
Reputation: 11584
Does this work:
library(dplyr)
df %>% group_by(district) %>%
mutate(close = case_when(vote_pct[candidate == 'A'] >= 0.5 & vote_pct[candidate == 'B'] >=0.5 ~ TRUE,TRUE ~ FALSE ))
# A tibble: 10 x 4
# Groups: district [5]
district candidate vote_pct close
<chr> <chr> <dbl> <lgl>
1 a A 0.494 FALSE
2 a B 0.502 FALSE
3 b A 0.492 FALSE
4 b B 0.516 FALSE
5 c A 0.503 FALSE
6 c B 0.492 FALSE
7 d A 0.505 TRUE
8 d B 0.507 TRUE
9 e A 0.506 FALSE
10 e B 0.497 FALSE
Upvotes: 0
Reputation: 388817
You can calculate absolute difference between the two candidates and assign TRUE
if it is less than 0.005.
library(dplyr)
my_df %>%
group_by(district) %>%
mutate(close = abs(diff(as.numeric(vote_pct))) < 0.005)
# district candidate vote_pct close
# <chr> <chr> <chr> <lgl>
# 1 a A 0.494 FALSE
# 2 a B 0.502 FALSE
# 3 b A 0.492 FALSE
# 4 b B 0.516 FALSE
# 5 c A 0.503 FALSE
# 6 c B 0.492 FALSE
# 7 d A 0.505 TRUE
# 8 d B 0.507 TRUE
# 9 e A 0.506 FALSE
#10 e B 0.497 FALSE
Second expected output could be achieved by :
my_df %>%
group_by(district) %>%
summarise(vote_diff = abs(diff(as.numeric(vote_pct))),
close = vote_diff < 0.005)
# district vote_diff close
# <chr> <dbl> <lgl>
#1 a 0.008 FALSE
#2 b 0.024 FALSE
#3 c 0.011 FALSE
#4 d 0.002 TRUE
#5 e 0.009 FALSE
Upvotes: 1