Reputation: 127
I have two dataframes that have more or less the same data which include two columns, 'User ID' (e.g.37f879fb-9667-4de2-9cc9-b48918fcaa1f) and 'Frequent User' (can either be 'Frequent' or 'Infrequent').
I am trying to merge these two dataframes together, but If one dataframe has 'Frequent' for Frequent User and the other is 'Infrequent', I want it to prefer and keep or replace it to Frequent user every time. I am unsure of how to achieve this
df1
structure(list(user_id = c("000f2b1e-dde2-4227-9122-d197c674dea8",
"001abd72-53f1-436a-a26f-e3187543afb6", "001c1e12-a8f7-4459-9b23-6b8bdc5d2625",
"002d8272-8ee2-4e1a-a523-5ba0813c20f9", "0037abe8-7623-4ac7-9fbb-7398f505c1f6",
"003911c6-c013-43b7-9996-e771dbe3ac83"), frequent_user = c("Frequent",
"Infrequent", "Infrequent", "Infrequent", "Infrequent", "Infrequent"
)), row.names = c(NA, 6L), class = "data.frame")
df2
structure(list(user_id = c("000f2b1e-dde2-4227-9122-d197c674dea8",
"001abd72-53f1-436a-a26f-e3187543afb6", "001c1e12-a8f7-4459-9b23-6b8bdc5d2625",
"002d8272-8ee2-4e1a-a523-5ba0813c20f9", "0037abe8-7623-4ac7-9fbb-7398f505c1f6",
"003911c6-c013-43b7-9996-e771dbe3ac83"), frequent_user = c("Infrequent",
"Infrequent", "Infrequent", "Infrequent", "Infrequent", "Infrequent"
)), row.names = c(NA, 6L), class = "data.frame")
Upvotes: 1
Views: 96
Reputation: 16978
Here is a dplyr
solution without a join
-function:
library(dplyr)
df1 %>%
bind_rows(df2) %>%
group_by(user_id) %>%
arrange(user_id, frequent_user) %>%
slice(1) %>%
ungroup()
returns
# A tibble: 6 x 2
user_id frequent_user
<chr> <chr>
1 000f2b1e-dde2-4227-9122-d197c674dea8 Frequent
2 001abd72-53f1-436a-a26f-e3187543afb6 Infrequent
3 001c1e12-a8f7-4459-9b23-6b8bdc5d2625 Infrequent
4 002d8272-8ee2-4e1a-a523-5ba0813c20f9 Infrequent
5 0037abe8-7623-4ac7-9fbb-7398f505c1f6 Infrequent
6 003911c6-c013-43b7-9996-e771dbe3ac83 Infrequent
Upvotes: 2
Reputation: 1945
Here is a tidyverse
-solution with left_join
and transmute
,
library(tidyverse)
left_join(
df1, df2, by = "user_id"
) %>% group_by(user_id) %>%
transmute(
frequent_user = case_when(
frequent_user.x == frequent_user.y ~ frequent_user.y,
TRUE ~ "Frequent"
)
)
Which gives the following ouptut,
# A tibble: 6 x 2
# Groups: user_id [6]
user_id frequent_user
<chr> <chr>
1 000f2b1e-dde2-4227-9122-d197c674dea8 Frequent
2 001abd72-53f1-436a-a26f-e3187543afb6 Infrequent
3 001c1e12-a8f7-4459-9b23-6b8bdc5d2625 Infrequent
4 002d8272-8ee2-4e1a-a523-5ba0813c20f9 Infrequent
5 0037abe8-7623-4ac7-9fbb-7398f505c1f6 Infrequent
6 003911c6-c013-43b7-9996-e771dbe3ac83 Infrequent
Upvotes: 2
Reputation: 101335
Try the code below
transform(
merge(df1, df2, by = "user_id", all = TRUE),
frequent_user = c("Infrequent", "Frequent")[1 + (rowSums(cbind(frequent_user.x, frequent_user.y) == "Frequent") > 0)]
)[c("user_id", "frequent_user")]
which gives
user_id frequent_user
1 000f2b1e-dde2-4227-9122-d197c674dea8 Frequent
2 001abd72-53f1-436a-a26f-e3187543afb6 Infrequent
3 001c1e12-a8f7-4459-9b23-6b8bdc5d2625 Infrequent
4 002d8272-8ee2-4e1a-a523-5ba0813c20f9 Infrequent
5 0037abe8-7623-4ac7-9fbb-7398f505c1f6 Infrequent
6 003911c6-c013-43b7-9996-e771dbe3ac83 Infrequent
Upvotes: 3