Reputation: 97
I'm working on the Brazilian local elections database and now I need to drop rows that contain more than one candidate ELECTED by Municipality. My data looks like:
>df
Candidate Municipality Situation
Rose A ELECTED
Mark A NON-ELECTED
Linda A ELECTED
Jhon A NON-ELECTED
James B ELECTED
Phillip B NON-ELECTED
Bruce B NON-ELECTED
In the case illustrated above, the elected candidate in municipality A had his mandate revoked, therefore, we had another election.
So, I'm looking for a routine that automatically drops any case like this in my data. The output I desire in this case is like:
>df
Candidate Municipality Situation
James B ELECTED
Phillip B NON-ELECTED
Bruce B NON-ELECTED
Regards,
Upvotes: 1
Views: 46
Reputation: 887991
Here is one option after grouping by 'Municipality', check the sum
of logical vector (Situation == "ELECTED"
) to return 1
library(dplyr)
df %>%
group_by(Municipality) %>%
filter(sum(Situation == "ELECTED") == 1)
# A tibble: 3 x 3
# Groups: Municipality [1]
# Candidate Municipality Situation
# <chr> <chr> <chr>
#1 James B ELECTED
#2 Phillip B NON-ELECTED
#3 Bruce B NON-ELECTED
Another option is n_distinct
df %>%
group_by(Municipality) %>%
filter(n_distinct(Candidate[Situation == "ELECTED"]) == 1)
In base R
,
subset(df, ave(Situation == "ELECTED", Municipality, FUN = sum) == 1)
# Candidate Municipality Situation
#5 James B ELECTED
#6 Phillip B NON-ELECTED
#7 Bruce B NON-ELECTED
df <- structure(list(Candidate = c("Rose", "Mark", "Linda", "Jhon",
"James", "Phillip", "Bruce"), Municipality = c("A", "A", "A",
"A", "B", "B", "B"), Situation = c("ELECTED", "NON-ELECTED",
"ELECTED", "NON-ELECTED", "ELECTED", "NON-ELECTED", "NON-ELECTED"
)), class = "data.frame", row.names = c(NA, -7L))
Upvotes: 1