Create an order of rows to keep in r based on column matches

Question

This is a portion of my dataset:

df<-data.frame(
  group=c("A","A","A","A","B","B","C","C","D","D","E","F","G","G"), 
  first_name=c("Linda","Linda","Linda","Linda","Henry","Henry","Hazel","Hazel","Owen","Owen","Ava","Nora","Rose","Rose"),
  first_name_match=c("Linda","Linda", "John","John","Oliver","Oliver","Hazel","Violet","Owen","Owen/Ben","Ava",NA,"Alex/Rose","Alex"))

For each group,

1) If the column first_name and first_name match are equal (exactly the same), keep ONLY those rows and get rid of the other rows.

2) If the column first_name and first_name match are equal (exactly the same), and there is a partial match b/w other rows. By partial match, I mean whether the first_name_match contains part of the first_name then ONLY grab the exact match.

2)If there is no exact match(they are not equal), I would like to keep the rows that partially match. By partially match, I mean whether the first_name_match contains part of the first_name.

3) if there is no match/partial match- keep the rows regardless and flag them for further understanding

Overall, I want to keep the order that have an exact match, if no exact match then partial match and then keep the rows that have no match or are NA as well.

Please see output:

df_final<-data.frame(
  group=c("A","A","B","B","C","D","E","F","G"), 
  first_name=c("Linda","Linda","Henry","Henry","Hazel","Owen","Ava","Nora","Rose"),
  first_name_match=c("Linda","Linda","Oliver","Oliver","Hazel","Owen","Ava",NA,"Alex/Rose"))

Ronak Shah · Accepted Answer

You can write a function to select rows for each group based on condition.

library(dplyr)
library(stringr)

select_rows <- function(x, y) {
   #If any exact match is found return exact match
   if(any(x == y, na.rm = TRUE)) x == y
   #else if partial match is found return partial match
   else if(any(str_detect(y, x), na.rm = TRUE)) str_detect(y, x)
   #If none of the above then return all rows
   else TRUE
}

and apply this function by group.

df %>% group_by(group) %>% filter(select_rows(first_name, first_name_match))


# group first_name first_name_match
#                    
#1 A     Linda      Linda           
#2 A     Linda      Linda           
#3 B     Henry      Oliver          
#4 B     Henry      Oliver          
#5 C     Hazel      Hazel           
#6 D     Owen       Owen            
#7 E     Ava        Ava             
#8 F     Nora       NA              
#9 G     Rose       Alex/Rose

Create an order of rows to keep in r based on column matches

Answers (2)

Related Questions