Reputation: 7161
Taking a simple dataframe from the R built-in dataset airquality and checking their missing values:
airquality %>% summary
While this works:
airquality %>% map_df(is.na) %>% map_df(sum)
Ozone Solar.R Wind Temp Month Day
<int> <int> <int> <int> <int> <int>
1 37 7 0 0 0 0
, and this - in purrr syntax - works too:
airquality %>% map_df(~sum(is.na(.)))
Ozone Solar.R Wind Temp Month Day
<int> <int> <int> <int> <int> <int>
1 37 7 0 0 0 0
, this doesn't work:
airquality %>% map_df(sum(is.na(.)))
Ozone Solar.R Wind Temp Month Day
<int> <int> <dbl> <int> <int> <int>
1 23 148 8 82 6 13
My question is: How can you explain the last result?
Where exactly does the calculation happen - in dplyr or purrr?
Upvotes: 1
Views: 827
Reputation: 12839
The behavior of the various syntaxes around %>%
is explained in detail in help("%>%", package = "magrittr")
.
In this specific instance, sum(is.na(.))
isn't interpreted as an anonymous function, like OP seems to expect, thus .
isn't the argument to an anonymous function.
Instead, .
is the LHS (left hand side) of the pipe.
airquality %>% map_df(sum(is.na(.)))
could be unfolded as map_df(airquality, .f = sum(is.na(airquality)))
.
sum(is.na(airquality))
evals to 44
, and from help("map_df")
, if the .f
argument to map_df
is a numeric vector,
it is converted to an extractor function
Summing up: this is extracting the 44th element of each column, and constraining it back to a data frame. Or, with some oversimplification, this extracts the 44th row.
Upvotes: 2