Remove duplicate rows checking duplicate values in multiple columns and keep the row where no NA values are present

Question

I scanned stackoverflow for more than an hour to find a solution, but failed. So posting the question.

I have a dataframe from where I need to remove duplicates, but the trick is, the duplicate values can be in two different columns for separate rows. I need to remove the row where another column has an NA value.

Example data frame

Act    Func     Func_2 
generate numbers    odd           
generate numbers   and
generate print      
generate column     print
displays time       
displays date       time
displays print      time
displays task

Since print is present in Func as well as Func_2 with same Act value in both rows, I need to remove the row where NA is present in Func_2. However, if the value in Act column would have been different, I would need to keep both rows.

Expected data frame

Act    Func     Func_2 
generate numbers    odd           
generate numbers   and
generate column     print
displays date       time
displays print      time
displays task

tjebo · Accepted Answer

Try this one here:

df1 %>% group_by(Act) %>% # the following test will be done by group
  mutate(test = if_else(Func %in% Func_2,
                                if_else(is.na(Func_2), FALSE, TRUE),
                                TRUE)) %>% 
#this will create a logical helper column.  
                                filter(test == TRUE) #just for completeness

# A tibble: 6 x 4
# Groups:   Act [2]
  Act      Func    Func_2 test 
           
1 generate numbers odd    T    
2 generate numbers and    T    
3 generate column  print  T    
4 displays date    time   T    
5 displays print   time   T    
6 displays task       T

Remove duplicate rows checking duplicate values in multiple columns and keep the row where no NA values are present

Answers (2)

Related Questions