Reputation: 343
There is something about the use of a dot as an argument that I'm not getting.
Using the starwars dataset that is built into the tidyverse packages.
Using this code:
starwars %>%
select(height, gender) %>%
filter(!complete.cases(.))
I get this output:
# A tibble: 9 x 2
height gender
<int> <chr>
1 167 NA
2 96 NA
3 97 NA
4 NA male
5 NA male
6 NA female
7 NA male
8 NA none
9 NA female
And yet for reasons that I don't understand, I can't get the same output using this code:
starwars %>%
select(height, gender) %>%
filter(is.na(.))
Why does the . argument work for complete.case but not for is.na I've seen a dot being used quite a lot but have never really understood what its doing.
Upvotes: 0
Views: 203
Reputation: 24810
In this case .
represents the entire data.frame.
For filter
, you need a logical vector that is the same length as the data has rows, but instead you're getting a matrix that has the same number of rows but has two columns.
is.na(starwars %>% select(height, gender))
# height gender
# [1,] FALSE FALSE
# [2,] FALSE FALSE
# [3,] FALSE FALSE
# [4,] FALSE FALSE
# [5,] FALSE FALSE
#... And 82 more rows
Remember that the %>%
operator is also using .
as the first argument to filter
. So you could think of this as:
filter(starwars %>% select(height, gender), is.na(starwars %>% select(height, gender)))
Here's an approach that is similar to yours:
starwars %>%
select(height, gender) %>%
filter(Reduce(`|`,as.data.frame(is.na(.))))
## A tibble: 9 x 2
# height gender
# <int> <chr>
#1 NA masculine
#2 183 NA
#3 183 NA
#4 178 NA
#5 NA masculine
#6 NA feminine
#7 NA masculine
#8 NA masculine
#9 NA NA
Upvotes: 2
Reputation: 462
the dot .
is a placeholder for the entire data.frame in this scenario.
complete.cases
is a function made specifically to work with data.frames,
looking rows that contains NA
, and returns a logical vector for filter
to use.
is.na
on the other hand works on vectors. In this case, only the first available vector in the piped data.frame has a logical vector returned, which is what filter
is using the subset the data.frame.
library(tidyverse)
starwars %>%
select(height, gender) %>%
filter(!complete.cases(.))
#> # A tibble: 9 x 2
#> height gender
#> <int> <chr>
#> 1 NA masculine
#> 2 183 <NA>
#> 3 183 <NA>
#> 4 178 <NA>
#> 5 NA masculine
#> 6 NA feminine
#> 7 NA masculine
#> 8 NA masculine
#> 9 NA <NA>
# only filters on the first column,
# it doesnt know to check any more
# since is.na is meant for vectors,
# not data.frames
starwars %>%
select(height, gender) %>%
filter(is.na(.))
#> # A tibble: 6 x 2
#> height gender
#> <int> <chr>
#> 1 NA masculine
#> 2 NA masculine
#> 3 NA feminine
#> 4 NA masculine
#> 5 NA masculine
#> 6 NA <NA>
# same as
starwars %>%
select(height, gender) %>%
filter(is.na(height))
#> # A tibble: 6 x 2
#> height gender
#> <int> <chr>
#> 1 NA masculine
#> 2 NA masculine
#> 3 NA feminine
#> 4 NA masculine
#> 5 NA masculine
#> 6 NA <NA>
# switching first column changes it
starwars %>%
select(gender, height) %>%
filter(is.na(.))
#> # A tibble: 4 x 2
#> gender height
#> <chr> <int>
#> 1 <NA> 183
#> 2 <NA> 183
#> 3 <NA> 178
#> 4 <NA> NA
# same as
starwars %>%
select(gender, height) %>%
filter(is.na(gender))
#> # A tibble: 4 x 2
#> gender height
#> <chr> <int>
#> 1 <NA> 183
#> 2 <NA> 183
#> 3 <NA> 178
#> 4 <NA> NA
Created on 2020-06-25 by the reprex package (v0.3.0)
Upvotes: 1