karuno
karuno

Reputation: 411

Delete duplicate rows based on condition in another column

Let's say I have this data frame:

df <- data.frame(
  a = c(NA,6,6,8),
  x= c(1,2,2,4),
  y = c(NA,2,NA,NA),
  z = c("apple", 2, "2", NA), 
  d = c(NA, 5, 5, 5),stringsAsFactors = FALSE)

Rows 2 and 3 are duplicates and row 3 has an NA value. I want to delete the duplicate row with the NA value so that it looks like this:

df <- data.frame(
  a = c(NA,6,8),
  x= c(1,2,4),
  y = c(NA,2,NA),
  z = c("apple", 2, NA), 
  d = c(NA, 5, 5),stringsAsFactors = FALSE)

I tried this but it doesn't work:

  
df2 <- df %>% group_by (a,x,z,d) %>% filter(y == max(y))

Any suggestions?

Upvotes: 0

Views: 533

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389235

Fill NA values with previous non-NA and select unique rows with distinct.

library(dplyr)
library(tidyr)

df %>% fill(everything()) %>% distinct()

#   a x  y     z  d
#1 NA 1 NA apple NA
#2  6 2  2     2  5
#3  8 4 NA  <NA>  5

Upvotes: 0

Onyambu
Onyambu

Reputation: 79338

df %>%
   arrange_all() %>%
   filter(!duplicated(fill(., everything())))
   a x  y     z  d
1 NA 1 NA apple NA
2  6 2  2     2  5
3  8 4 NA  <NA>  5

Upvotes: 1

foreach
foreach

Reputation: 99

df %>% arrange(a,x,z,d) %>% distinct(a,x,z,d,.keep_all=TRUE)

   a x  y     z  d
1  6 2  2     2  5
2  8 4 NA  <NA>  5
3 NA 1 NA apple NA

Upvotes: 0

Related Questions