How to drop rows containing NA in specified columns?

Question

I have a dataframe like this

      dep_delay  temp  humid wind_dir precip pressure  date
16983  3         68.00 53.06 NA       0      1020.8    2013-05-07 

26477  42        NA  64.93 360      0        NA        2013-03-07
...
29299  -1        NA    NA    NA       NA     NA        2013-12-31 

29300  33        NA    NA    NA       NA     NA        2013-12-31

I want to drop only the rows like 29299 and 29300, which contain 5 NAs from temp to pressure (these are consecutive columns), and keep the rows like 16983 and 26477.

desired result:

      dep_delay  temp  humid wind_dir precip pressure  date
16983  3         68.00 53.06 NA       0      1020.8    2013-05-07 

26477  42        NA  64.93 360      0        NA        2013-03-07

In other words, the problem is how to remove only the rows where there are at least 5 NAs in a row.

apparently this is not the right way to do it:

df <- df[!is.na(df$temp:df$pressure),]

Artem · Accepted Answer

Updated based on Yacine Jajji comment.

You can use standard filter function in dplyr package. You set the number of columns which should be never NA. In your case there are 2: dep_delay and date. Then calculate amount of NA in each row, if the number equals 5 the row will be filtered out. See the code below:

df <- read.table( text = "dep_delay  temp  humid wind_dir precip pressure  date
16983  3         68.00 53.06 NA       0      1020.8    2013-05-07 
26477  42        NA  64.93 360      0        NA        2013-03-07
29299  -1        NA    NA    NA       NA     NA        2013-12-31 
29300  33        NA    NA    NA       NA     NA        2013-12-31")

library(dplyr)

cols_to_remove <- c("temp", "humid", "wind_dir", "precip", "pressure")
df[rowSums(is.na(df[, cols_to_remove])) != 
     ncol(df[, cols_to_remove]), ]

Output:

      dep_delay temp humid wind_dir precip pressure       date
16983         3   68 53.06       NA      0   1020.8 2013-05-07
26477        42   NA 64.93      360      0       NA 2013-03-07

How to drop rows containing NA in specified columns?

Answers (1)

Related Questions