Reputation: 57
I have a dataset containing variables that give information about the voteshare
of a party
in a given year
and district
and whether or not the respective party sent a candidate
to parliament, like this:
year district party voteshare candidate
2000 A P1 50% 1
2000 A P2 30% 0
2000 A P3 20% 0
2000 B P1 43% 1
2000 B P2 21% 0
2000 B P3 34% 0
...
Now, I want to calcuate each party's margin of loss/victory (i.e. how "close" the election was for the respective party) by substracting each party's voteshare from the winning party (the party that sent a candidate to parliament) and the winning party's voteshare from the second successful party, such that:
year district party voteshare candidate margin
2000 A P1 50% 1 +20%
2000 A P2 30% 0 -20%
2000 A P3 20% 0 -30%
2000 B P1 43% 1 +9%
2000 B P2 21% 0 -22%
2000 B P3 34% 0 -9%
...
I don't know how to do that with dplyr...
Upvotes: 0
Views: 67
Reputation: 17853
Here is a solution using the data.table
package:
library(data.table)
df1 <- structure(list(year = c(2000L, 2000L, 2000L, 2000L, 2000L, 2000L
), district = c("A", "A", "A", "B", "B", "B"), party = c("P1",
"P2", "P3", "P1", "P2", "P3"), voteshare = c("50%", "30%", "20%",
"43%", "21%", "34%"), candidate = c(1L, 0L, 0L, 1L, 0L, 0L)),
class = "data.frame", row.names = c(NA, -6L))
setDT(df1)
df1[, margin := as.numeric(gsub("%", "", voteshare))][
, margin := fcase(candidate == 1, diff(tail(sort(margin), 2)),
candidate == 0, margin - max(margin)),
by=.(district)][
, margin := fcase(margin < 0, sprintf("%s%%", margin),
margin > 0, sprintf("+%s%%", margin),
margin == 0, "0%")]
df1
#> year district party voteshare candidate margin
#> 1: 2000 A P1 50% 1 +20%
#> 2: 2000 A P2 30% 0 -20%
#> 3: 2000 A P3 20% 0 -30%
#> 4: 2000 B P1 43% 1 +9%
#> 5: 2000 B P2 21% 0 -22%
#> 6: 2000 B P3 34% 0 -9%
Upvotes: 0
Reputation: 389325
You can do :
library(dplyr)
df1 %>%
#Turn voteshare to a number
mutate(voteshare = readr::parse_number(voteshare)) %>%
group_by(year, district) %>%
#When candidate is sent to parliament
mutate(margin = case_when(candidate == 1 ~
#Subtract with second highest voteshare
voteshare - sort(voteshare, decreasing = TRUE)[2],
#else subtract with voteshare of highest candidate
TRUE ~ voteshare - voteshare[candidate == 1]))
# year district party voteshare candidate margin
# <int> <chr> <chr> <dbl> <int> <dbl>
#1 2000 A P1 50 1 20
#2 2000 A P2 30 0 -20
#3 2000 A P3 20 0 -30
#4 2000 B P1 43 1 9
#5 2000 B P2 21 0 -22
#6 2000 B P3 34 0 -9
data
df1 <- structure(list(year = c(2000L, 2000L, 2000L, 2000L, 2000L, 2000L
), district = c("A", "A", "A", "B", "B", "B"), party = c("P1",
"P2", "P3", "P1", "P2", "P3"), voteshare = c("50%", "30%", "20%",
"43%", "21%", "34%"), candidate = c(1L, 0L, 0L, 1L, 0L, 0L)),
class = "data.frame", row.names = c(NA, -6L))
Upvotes: 1