Reputation: 4200
I have the messy result of merging two data frames and want to decide according to specified criteria.
The data looks as follows (only duplicates shown):
structure(list(date = structure(c(2347, 2347, 2347, 2347, 2347, 2347, 2347, 2347, 6962, 6962, 16442, 16442, 16442, 16442), class = "Date"),
country = c("United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom",
"United Kingdom", "Greece", "Greece", "France", "France", "France", "France"),
city = c("Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Athens", "Athens", "Paris", "Paris", "Paris", "Paris"),
diff_categories = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
diff_num1 = c(-1, -4, 0, -3, 3, 0, -1, -4, 0, 1, 0, 12, -12, 0),
diff_num2 = c(NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, 1, 11, -10, 0),
df1_id = c("df1_197606050002", "df1_197606050002", "df1_197606050003", "df1_197606050003","df1_197606050004", "df1_197606050004", "df1_197606050006",
"df1_197606050006","df1_198901230001", "df1_198901230001", "df1_201501070001", "df1_201501070001","df1_201501070002", "df1_201501070002"),
df2_id = c("df2_101", "df2_102", "df2_101", "df2_102", "df2_101", "df2_102", "df2_101", "df2_102", "df2_216", "df2_219", "df2_510", "df2_511", "df2_510", "df2_511")),
row.names = c(NA, -14L), class = c("tbl_df", "tbl", "data.frame"))
I now want to only keep one row per instance of df1_id
and decide which row according to the following criteria (in descending order; first is most important):
diff_categories
must be FALSE
diff_num1
should be as small as possiblediff_num2
should be as small as possibleCan someone point out how to best implement this logic?
Upvotes: 0
Views: 24
Reputation: 11548
Will this work:
library(dplyr)
df %>%
group_by(df1_id) %>%
filter(diff_categories == TRUE & diff_num1 == min(diff_num1) & diff_num2 == min(diff_num2))
Upvotes: 1