ph_9933
ph_9933

Reputation: 97

Dropping rows by condition

I'm working on the Brazilian local elections database and now I need to drop rows that contain more than one candidate ELECTED by Municipality. My data looks like:

>df
Candidate     Municipality      Situation 
Rose               A            ELECTED
Mark               A            NON-ELECTED
Linda              A            ELECTED
Jhon               A            NON-ELECTED
James              B            ELECTED
Phillip            B            NON-ELECTED
Bruce              B            NON-ELECTED

In the case illustrated above, the elected candidate in municipality A had his mandate revoked, therefore, we had another election.

So, I'm looking for a routine that automatically drops any case like this in my data. The output I desire in this case is like:

>df
    Candidate     Municipality      Situation 
    James              B            ELECTED
    Phillip            B            NON-ELECTED
    Bruce              B            NON-ELECTED

Regards,

Upvotes: 1

Views: 46

Answers (1)

akrun
akrun

Reputation: 887991

Here is one option after grouping by 'Municipality', check the sum of logical vector (Situation == "ELECTED") to return 1

library(dplyr)
df %>%
  group_by(Municipality) %>%
  filter(sum(Situation == "ELECTED") == 1)
# A tibble: 3 x 3
# Groups:   Municipality [1]
#  Candidate Municipality Situation  
#  <chr>     <chr>        <chr>      
#1 James     B            ELECTED    
#2 Phillip   B            NON-ELECTED
#3 Bruce     B            NON-ELECTED

Another option is n_distinct

df %>%
  group_by(Municipality) %>%
  filter(n_distinct(Candidate[Situation == "ELECTED"]) == 1)

In base R,

subset(df, ave(Situation == "ELECTED", Municipality, FUN = sum) == 1)
#   Candidate Municipality   Situation
#5     James            B     ELECTED
#6   Phillip            B NON-ELECTED
#7     Bruce            B NON-ELECTED

data

df <- structure(list(Candidate = c("Rose", "Mark", "Linda", "Jhon", 
"James", "Phillip", "Bruce"), Municipality = c("A", "A", "A", 
"A", "B", "B", "B"), Situation = c("ELECTED", "NON-ELECTED", 
"ELECTED", "NON-ELECTED", "ELECTED", "NON-ELECTED", "NON-ELECTED"
)), class = "data.frame", row.names = c(NA, -7L))

Upvotes: 1

Related Questions