Tea Tree
Tea Tree

Reputation: 984

How to compare values within subgroup?

I have vote percentages for two candidates for a number of districts. I would like to see whether the vote percentages are very close, where I define very close as a 0.5% difference. The goal is to get a new boolean column that outputs TRUE if the vote percentage was close in that district and FALSE if the vote percentage was not close.

MWE:

set.seed(1)
district <- rep(letters[1:5], each = 2)
candidate <- rep(LETTERS[1:2], 5)
vote_pct <- round(rnorm(10, 0.5, 0.01), 3)
my_df <- as.data.frame(cbind(district, candidate, vote_pct))

Thus the data looks like:

> my_df
   district candidate vote_pct
1         a         A    0.494
2         a         B    0.502
3         b         A    0.492
4         b         B    0.516
5         c         A    0.503
6         c         B    0.492
7         d         A    0.505
8         d         B    0.507
9         e         A    0.506
10        e         B    0.497

In my dataset, the percentages for two candidates in a district would not exceed 100%.

What I would like is

   district candidate vote_pct close
1         a         A    0.494 FALSE
2         a         B    0.502 FALSE
3         b         A    0.492 FALSE
4         b         B    0.516 FALSE
5         c         A    0.503 FALSE
6         c         B    0.492 FALSE
7         d         A    0.505 TRUE
8         d         B    0.507 TRUE
9         e         A    0.506 FALSE
10        e         B    0.497 FALSE

Or alternatively,

   district vote_diff close
1         a    -0.008 FALSE
...

In my dataset, there are many more columns in case that makes a difference.

Upvotes: 0

Views: 44

Answers (2)

Karthik S
Karthik S

Reputation: 11584

Does this work:

library(dplyr)
df %>% group_by(district) %>% 
   mutate(close = case_when(vote_pct[candidate == 'A'] >= 0.5 & vote_pct[candidate == 'B'] >=0.5 ~  TRUE,TRUE  ~ FALSE ))
# A tibble: 10 x 4
# Groups:   district [5]
   district candidate vote_pct close
   <chr>    <chr>        <dbl> <lgl>
 1 a        A            0.494 FALSE
 2 a        B            0.502 FALSE
 3 b        A            0.492 FALSE
 4 b        B            0.516 FALSE
 5 c        A            0.503 FALSE
 6 c        B            0.492 FALSE
 7 d        A            0.505 TRUE 
 8 d        B            0.507 TRUE 
 9 e        A            0.506 FALSE
10 e        B            0.497 FALSE

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388817

You can calculate absolute difference between the two candidates and assign TRUE if it is less than 0.005.

library(dplyr)

my_df %>%
  group_by(district) %>%
  mutate(close = abs(diff(as.numeric(vote_pct))) < 0.005)

#   district candidate vote_pct close
#   <chr>    <chr>     <chr>    <lgl>
# 1 a        A         0.494    FALSE
# 2 a        B         0.502    FALSE
# 3 b        A         0.492    FALSE
# 4 b        B         0.516    FALSE
# 5 c        A         0.503    FALSE
# 6 c        B         0.492    FALSE
# 7 d        A         0.505    TRUE 
# 8 d        B         0.507    TRUE 
# 9 e        A         0.506    FALSE
#10 e        B         0.497    FALSE

Second expected output could be achieved by :

my_df %>%
  group_by(district) %>%
  summarise(vote_diff = abs(diff(as.numeric(vote_pct))), 
            close = vote_diff < 0.005)

# district vote_diff close
#  <chr>        <dbl> <lgl>
#1 a            0.008 FALSE
#2 b            0.024 FALSE
#3 c            0.011 FALSE
#4 d            0.002 TRUE 
#5 e            0.009 FALSE

Upvotes: 1

Related Questions