Greg Martin
Greg Martin

Reputation: 343

Using a dot as an argument in

There is something about the use of a dot as an argument that I'm not getting.

Using the starwars dataset that is built into the tidyverse packages.

Using this code:

starwars %>% 
    select(height, gender) %>% 
    filter(!complete.cases(.))

I get this output:

# A tibble: 9 x 2
  height gender
   <int> <chr> 
1    167 NA    
2     96 NA    
3     97 NA    
4     NA male  
5     NA male  
6     NA female
7     NA male  
8     NA none  
9     NA female

And yet for reasons that I don't understand, I can't get the same output using this code:

starwars %>% 
    select(height, gender) %>% 
    filter(is.na(.))

Why does the . argument work for complete.case but not for is.na I've seen a dot being used quite a lot but have never really understood what its doing.

Upvotes: 0

Views: 203

Answers (2)

Ian Campbell
Ian Campbell

Reputation: 24810

In this case . represents the entire data.frame.

For filter, you need a logical vector that is the same length as the data has rows, but instead you're getting a matrix that has the same number of rows but has two columns.

is.na(starwars %>% select(height, gender))
#       height gender
# [1,]  FALSE  FALSE
# [2,]  FALSE  FALSE
# [3,]  FALSE  FALSE
# [4,]  FALSE  FALSE
# [5,]  FALSE  FALSE
#... And 82 more rows

Remember that the %>% operator is also using . as the first argument to filter. So you could think of this as:

filter(starwars %>% select(height, gender), is.na(starwars %>% select(height, gender)))

Here's an approach that is similar to yours:

starwars %>% 
  select(height, gender) %>%
  filter(Reduce(`|`,as.data.frame(is.na(.))))
## A tibble: 9 x 2
#  height gender   
#   <int> <chr>    
#1     NA masculine
#2    183 NA       
#3    183 NA       
#4    178 NA       
#5     NA masculine
#6     NA feminine 
#7     NA masculine
#8     NA masculine
#9     NA NA       

Upvotes: 2

Athanasia Mowinckel
Athanasia Mowinckel

Reputation: 462

the dot . is a placeholder for the entire data.frame in this scenario. complete.cases is a function made specifically to work with data.frames, looking rows that contains NA, and returns a logical vector for filter to use.

is.na on the other hand works on vectors. In this case, only the first available vector in the piped data.frame has a logical vector returned, which is what filter is using the subset the data.frame.

library(tidyverse)

starwars %>% 
  select(height, gender) %>% 
  filter(!complete.cases(.))
#> # A tibble: 9 x 2
#>   height gender   
#>    <int> <chr>    
#> 1     NA masculine
#> 2    183 <NA>     
#> 3    183 <NA>     
#> 4    178 <NA>     
#> 5     NA masculine
#> 6     NA feminine 
#> 7     NA masculine
#> 8     NA masculine
#> 9     NA <NA>

# only filters on the first column,
# it doesnt know to check any more
# since is.na is meant for vectors, 
# not data.frames
starwars %>% 
  select(height, gender) %>% 
  filter(is.na(.))
#> # A tibble: 6 x 2
#>   height gender   
#>    <int> <chr>    
#> 1     NA masculine
#> 2     NA masculine
#> 3     NA feminine 
#> 4     NA masculine
#> 5     NA masculine
#> 6     NA <NA>

# same as
starwars %>% 
  select(height, gender) %>% 
  filter(is.na(height))
#> # A tibble: 6 x 2
#>   height gender   
#>    <int> <chr>    
#> 1     NA masculine
#> 2     NA masculine
#> 3     NA feminine 
#> 4     NA masculine
#> 5     NA masculine
#> 6     NA <NA>

# switching first column changes it
starwars %>% 
  select(gender, height) %>% 
  filter(is.na(.))
#> # A tibble: 4 x 2
#>   gender height
#>   <chr>   <int>
#> 1 <NA>      183
#> 2 <NA>      183
#> 3 <NA>      178
#> 4 <NA>       NA

# same as
starwars %>% 
  select(gender, height) %>% 
  filter(is.na(gender))
#> # A tibble: 4 x 2
#>   gender height
#>   <chr>   <int>
#> 1 <NA>      183
#> 2 <NA>      183
#> 3 <NA>      178
#> 4 <NA>       NA

Created on 2020-06-25 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions