Reputation: 163
library(tidyverse)
library(nycflights13)
I want to only select the flights that have values in given columns. So I don't care about the flights that have nulls in the columns dep_delay, arr_delay and distance
I am getting an error saying: Error: Result must have length 1, not 3
This error is caused by this: filter(!is.na(c("dep_delay", "arr_delay", "distance")))
flights %>%
group_by(dep_delay, arr_delay, distance) %>%
filter(!is.na(c("dep_delay", "arr_delay", "distance"))) %>%
summarise()
I also tried doing filter(!is.na("dep_delay", "arr_delay", "distance"))
(removing the c(...)
Upvotes: 1
Views: 171
Reputation: 887213
If there are multiple columns, use filter_at
(assuming that we are removing rows if there are any NAs in a row for each of the columnss
library(dplyr)
flights %>%
filter_at(vars(c("dep_delay", "arr_delay", "distance")),
all_vars(!is.na(.)))
# A tibble: 327,346 x 19
# year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin dest
# <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> <chr> <chr>
# 1 2013 1 1 517 515 2 830 819 11 UA 1545 N14228 EWR IAH
# 2 2013 1 1 533 529 4 850 830 20 UA 1714 N24211 LGA IAH
# 3 2013 1 1 542 540 2 923 850 33 AA 1141 N619AA JFK MIA
# 4 2013 1 1 544 545 -1 1004 1022 -18 B6 725 N804JB JFK BQN
# 5 2013 1 1 554 600 -6 812 837 -25 DL 461 N668DN LGA ATL
# 6 2013 1 1 554 558 -4 740 728 12 UA 1696 N39463 EWR ORD
# 7 2013 1 1 555 600 -5 913 854 19 B6 507 N516JB EWR FLL
# 8 2013 1 1 557 600 -3 709 723 -14 EV 5708 N829AS LGA IAD
# 9 2013 1 1 557 600 -3 838 846 -8 B6 79 N593JB JFK MCO
#10 2013 1 1 558 600 -2 753 745 8 AA 301 N3ALAA LGA ORD
# … with 327,336 more rows, and 5 more variables: air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
# time_hour <dttm>
In the devel version, we can use across
with filter
flights %>%
filter(across(c(dep_delay, arr_delay, distance), ~ !is.na(.)))
If the condition is to have at least one non-NA among those columns, replace the all_vars
with any_vars
flights %>%
filter_at(vars(c("dep_delay", "arr_delay", "distance")),
any_vars(!is.na(.)))
NOTE: the group_by
step can be after the filter
step as we are using the same columns
Upvotes: 1