States
States

Reputation: 163

Getting an error when trying to use "filter" function on several parameters

library(tidyverse)
library(nycflights13)

I want to only select the flights that have values in given columns. So I don't care about the flights that have nulls in the columns dep_delay, arr_delay and distance

I am getting an error saying: Error: Result must have length 1, not 3

This error is caused by this: filter(!is.na(c("dep_delay", "arr_delay", "distance")))

flights %>% 
    group_by(dep_delay, arr_delay, distance) %>% 
    filter(!is.na(c("dep_delay", "arr_delay", "distance"))) %>% 
    summarise()

I also tried doing filter(!is.na("dep_delay", "arr_delay", "distance")) (removing the c(...)

Upvotes: 1

Views: 171

Answers (1)

akrun
akrun

Reputation: 887213

If there are multiple columns, use filter_at (assuming that we are removing rows if there are any NAs in a row for each of the columnss

library(dplyr)
flights %>%         
     filter_at(vars(c("dep_delay", "arr_delay", "distance")), 
           all_vars(!is.na(.)))
# A tibble: 327,346 x 19   
#    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin dest 
#   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl> <chr>    <int> <chr>   <chr>  <chr>
# 1  2013     1     1      517            515         2      830            819        11 UA        1545 N14228  EWR    IAH  
# 2  2013     1     1      533            529         4      850            830        20 UA        1714 N24211  LGA    IAH  
# 3  2013     1     1      542            540         2      923            850        33 AA        1141 N619AA  JFK    MIA  
# 4  2013     1     1      544            545        -1     1004           1022       -18 B6         725 N804JB  JFK    BQN  
# 5  2013     1     1      554            600        -6      812            837       -25 DL         461 N668DN  LGA    ATL  
# 6  2013     1     1      554            558        -4      740            728        12 UA        1696 N39463  EWR    ORD  
# 7  2013     1     1      555            600        -5      913            854        19 B6         507 N516JB  EWR    FLL  
# 8  2013     1     1      557            600        -3      709            723       -14 EV        5708 N829AS  LGA    IAD  
# 9  2013     1     1      557            600        -3      838            846        -8 B6          79 N593JB  JFK    MCO  
#10  2013     1     1      558            600        -2      753            745         8 AA         301 N3ALAA  LGA    ORD  
# … with 327,336 more rows, and 5 more variables: air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
#   time_hour <dttm>

In the devel version, we can use across with filter

flights %>% 
        filter(across(c(dep_delay, arr_delay, distance), ~ !is.na(.)))

If the condition is to have at least one non-NA among those columns, replace the all_vars with any_vars

flights %>%            
          filter_at(vars(c("dep_delay", "arr_delay", "distance")), 
                any_vars(!is.na(.)))

NOTE: the group_by step can be after the filter step as we are using the same columns

Upvotes: 1

Related Questions