Metel Stairs
Metel Stairs

Reputation: 127

Combine two dataframes that have the same data but prefer one enter over another

I have two dataframes that have more or less the same data which include two columns, 'User ID' (e.g.37f879fb-9667-4de2-9cc9-b48918fcaa1f) and 'Frequent User' (can either be 'Frequent' or 'Infrequent').

I am trying to merge these two dataframes together, but If one dataframe has 'Frequent' for Frequent User and the other is 'Infrequent', I want it to prefer and keep or replace it to Frequent user every time. I am unsure of how to achieve this

df1

structure(list(user_id = c("000f2b1e-dde2-4227-9122-d197c674dea8", 
"001abd72-53f1-436a-a26f-e3187543afb6", "001c1e12-a8f7-4459-9b23-6b8bdc5d2625", 
"002d8272-8ee2-4e1a-a523-5ba0813c20f9", "0037abe8-7623-4ac7-9fbb-7398f505c1f6", 
"003911c6-c013-43b7-9996-e771dbe3ac83"), frequent_user = c("Frequent", 
"Infrequent", "Infrequent", "Infrequent", "Infrequent", "Infrequent"
)), row.names = c(NA, 6L), class = "data.frame")

df2

structure(list(user_id = c("000f2b1e-dde2-4227-9122-d197c674dea8", 
"001abd72-53f1-436a-a26f-e3187543afb6", "001c1e12-a8f7-4459-9b23-6b8bdc5d2625", 
"002d8272-8ee2-4e1a-a523-5ba0813c20f9", "0037abe8-7623-4ac7-9fbb-7398f505c1f6", 
"003911c6-c013-43b7-9996-e771dbe3ac83"), frequent_user = c("Infrequent", 
"Infrequent", "Infrequent", "Infrequent", "Infrequent", "Infrequent"
)), row.names = c(NA, 6L), class = "data.frame")

Upvotes: 1

Views: 96

Answers (3)

Martin Gal
Martin Gal

Reputation: 16978

Here is a dplyr solution without a join-function:

library(dplyr)

df1 %>% 
  bind_rows(df2) %>% 
  group_by(user_id) %>% 
  arrange(user_id, frequent_user) %>% 
  slice(1) %>%
  ungroup()

returns

# A tibble: 6 x 2
  user_id                              frequent_user
  <chr>                                <chr>        
1 000f2b1e-dde2-4227-9122-d197c674dea8 Frequent     
2 001abd72-53f1-436a-a26f-e3187543afb6 Infrequent   
3 001c1e12-a8f7-4459-9b23-6b8bdc5d2625 Infrequent   
4 002d8272-8ee2-4e1a-a523-5ba0813c20f9 Infrequent   
5 0037abe8-7623-4ac7-9fbb-7398f505c1f6 Infrequent   
6 003911c6-c013-43b7-9996-e771dbe3ac83 Infrequent  

Upvotes: 2

Serkan
Serkan

Reputation: 1945

Here is a tidyverse-solution with left_join and transmute,

library(tidyverse)

left_join(
        df1, df2, by = "user_id"
) %>% group_by(user_id) %>% 
        transmute(
                frequent_user = case_when(
                        frequent_user.x == frequent_user.y ~ frequent_user.y,
                        TRUE ~ "Frequent"
                )
        )

Which gives the following ouptut,

# A tibble: 6 x 2
# Groups:   user_id [6]
  user_id                              frequent_user
  <chr>                                <chr>        
1 000f2b1e-dde2-4227-9122-d197c674dea8 Frequent     
2 001abd72-53f1-436a-a26f-e3187543afb6 Infrequent   
3 001c1e12-a8f7-4459-9b23-6b8bdc5d2625 Infrequent   
4 002d8272-8ee2-4e1a-a523-5ba0813c20f9 Infrequent   
5 0037abe8-7623-4ac7-9fbb-7398f505c1f6 Infrequent   
6 003911c6-c013-43b7-9996-e771dbe3ac83 Infrequent       

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 101335

Try the code below

transform(
  merge(df1, df2, by = "user_id", all = TRUE),
  frequent_user = c("Infrequent", "Frequent")[1 + (rowSums(cbind(frequent_user.x, frequent_user.y) == "Frequent") > 0)]
)[c("user_id", "frequent_user")]

which gives

                               user_id frequent_user
1 000f2b1e-dde2-4227-9122-d197c674dea8      Frequent
2 001abd72-53f1-436a-a26f-e3187543afb6    Infrequent
3 001c1e12-a8f7-4459-9b23-6b8bdc5d2625    Infrequent
4 002d8272-8ee2-4e1a-a523-5ba0813c20f9    Infrequent
5 0037abe8-7623-4ac7-9fbb-7398f505c1f6    Infrequent
6 003911c6-c013-43b7-9996-e771dbe3ac83    Infrequent

Upvotes: 3

Related Questions