NinjaR
NinjaR

Reputation: 623

Remove duplicate rows checking duplicate values in multiple columns and keep the row where no NA values are present

I scanned stackoverflow for more than an hour to find a solution, but failed. So posting the question.

I have a dataframe from where I need to remove duplicates, but the trick is, the duplicate values can be in two different columns for separate rows. I need to remove the row where another column has an NA value.

Example data frame

Act    Func     Func_2 
generate numbers    odd           
generate numbers   and
generate print      <NA>
generate column     print
displays time       <NA>
displays date       time
displays print      time
displays task       <NA>

Since print is present in Func as well as Func_2 with same Act value in both rows, I need to remove the row where NA is present in Func_2. However, if the value in Act column would have been different, I would need to keep both rows.

Expected data frame

Act    Func     Func_2 
generate numbers    odd           
generate numbers   and
generate column     print
displays date       time
displays print      time
displays task       <NA>

Upvotes: 2

Views: 237

Answers (2)

tjebo
tjebo

Reputation: 23747

Try this one here:

df1 %>% group_by(Act) %>% # the following test will be done by group
  mutate(test = if_else(Func %in% Func_2,
                                if_else(is.na(Func_2), FALSE, TRUE),
                                TRUE)) %>% 
#this will create a logical helper column.  
                                filter(test == TRUE) #just for completeness

# A tibble: 6 x 4
# Groups:   Act [2]
  Act      Func    Func_2 test 
  <chr>    <chr>   <chr>  <lgl>
1 generate numbers odd    T    
2 generate numbers and    T    
3 generate column  print  T    
4 displays date    time   T    
5 displays print   time   T    
6 displays task    <NA>   T 

Upvotes: 1

ADF
ADF

Reputation: 572

Is this what you're after?

library(dplyr)

Make dataframe:

a <- c(1, 2, 3, 4)
b <- c(5, 6, NA, 7)
df <- data.frame(a, b)

Filter rows to ensure neither column is missing:

df <- filter(df, !is.na(a) & !is.na(b))

Upvotes: 0

Related Questions