revans00
revans00

Reputation: 17

remove rows containing NA based on condition

df <- data.frame(x = 1:7, y = c(NA, NA, 5, 10, NA, 20, 30))

From df I want to remove rows containing NA in y based on the condition that the x value in that row is smaller than the x value in the row with the minimum y value to obtain this data frame.

data.frame(x = 3:7, y = c(5, 10, NA, 20, 30))

dlypr() solutions preferable!

Upvotes: 1

Views: 209

Answers (2)

Rui Barradas
Rui Barradas

Reputation: 76402

Use logical indices for each of the conditions and combine them with logical AND, &:

df <- data.frame(x = 1:7, y = c(NA, NA, 5, 10, NA, 20, 30))

i <- is.na(df$y)
j <- df$x < df$y
df[!i & j, ]
#  x  y
#3 3  5
#4 4 10
#6 6 20
#7 7 30

Upvotes: 1

akrun
akrun

Reputation: 887118

We could use which.min to get the index of minimum 'y' value, subset the 'x' create the comparison with the 'x' values along with the expression for NA elements in 'y' and negate (!)

subset(df,  !(x< x[which.min(y)] & is.na(y)))

-output

 x  y
3 3  5
4 4 10
5 5 NA
6 6 20
7 7 30

Or the same logic can be applied with dplyr::filter

library(dplyr)
df %>%
    filter(!(x< x[which.min(y)] & is.na(y)))

-ouptut

 x  y
1 3  5
2 4 10
3 5 NA
4 6 20
5 7 30

data

df <- structure(list(x = 1:7, y = c(NA, NA, 5, 10, NA, 20, 30)), 
class = "data.frame", row.names = c(NA, 
-7L))

Upvotes: 1

Related Questions