Ivo
Ivo

Reputation: 4200

filter duplicates according to conditions

I have the messy result of merging two data frames and want to decide according to specified criteria.

The data looks as follows (only duplicates shown):

structure(list(date = structure(c(2347, 2347, 2347, 2347, 2347, 2347, 2347, 2347, 6962, 6962, 16442, 16442, 16442, 16442), class = "Date"),
               country = c("United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
                           "United Kingdom", "Greece", "Greece", "France", "France", "France", "France"), 
               city = c("Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Athens",  "Athens", "Paris", "Paris", "Paris", "Paris"), 
               diff_categories = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), 
               diff_num1 = c(-1, -4, 0, -3, 3, 0, -1, -4, 0, 1, 0, 12, -12, 0), 
               diff_num2 = c(NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, 1, 11, -10, 0), 
               df1_id = c("df1_197606050002", "df1_197606050002", "df1_197606050003", "df1_197606050003","df1_197606050004", "df1_197606050004", "df1_197606050006", 
                          "df1_197606050006","df1_198901230001", "df1_198901230001", "df1_201501070001", "df1_201501070001","df1_201501070002", "df1_201501070002"),
               df2_id = c("df2_101", "df2_102", "df2_101", "df2_102", "df2_101", "df2_102", "df2_101", "df2_102", "df2_216", "df2_219", "df2_510",  "df2_511",  "df2_510", "df2_511")), 
          row.names = c(NA, -14L), class = c("tbl_df", "tbl", "data.frame"))

I now want to only keep one row per instance of df1_id and decide which row according to the following criteria (in descending order; first is most important):

Can someone point out how to best implement this logic?

Upvotes: 0

Views: 24

Answers (1)

Karthik S
Karthik S

Reputation: 11548

Will this work:

library(dplyr)
df %>% 
   group_by(df1_id) %>% 
       filter(diff_categories == TRUE & diff_num1 == min(diff_num1) & diff_num2 == min(diff_num2))

Upvotes: 1

Related Questions