Reputation: 3071
I can use tidyr::complete
to expose missing rows.
For example, with the following data.frame, I can expose the missing third quarter easily:
suppressPackageStartupMessages({
library(dplyr)
library(tidyr)
})
set.seed(42)
df <- data.frame(
id = c(rep(1, 3), rep(2, 3)),
year = rep(2020, 3),
quarter = c(1, 2, 4),
data = runif(3)
)
df %>% complete(nesting(id, year), quarter = 1:4)
#> # A tibble: 8 x 4
#> id year quarter data
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2020 1 0.915
#> 2 1 2020 2 0.937
#> 3 1 2020 3 NA
#> 4 1 2020 4 0.286
#> 5 2 2020 1 0.915
#> 6 2 2020 2 0.937
#> 7 2 2020 3 NA
#> 8 2 2020 4 0.286
Created on 2020-03-02 by the reprex package (v0.3.0)
Now, I want to create a wrapper for this specific use-case: it takes in a data.frame with at least these four columns and exposes the missing quarters.
That's easy enough, just wrap the complete
call in a function:
expose <- function(df) {
complete(df, nesting(id, year), quarter = 1:4)
}
expose(df)
#> # A tibble: 8 x 4
#> id year quarter data
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2020 1 0.915
#> 2 1 2020 2 0.937
#> 3 1 2020 3 NA
#> 4 1 2020 4 0.286
#> 5 2 2020 1 0.915
#> 6 2 2020 2 0.937
#> 7 2 2020 3 NA
#> 8 2 2020 4 0.286
However, the incoming data.frame might have other columns which are known to be constant for a given id
. In this case, the function doesn't work, since it naturally sets those column to NA
on the missing rows.
df <- data.frame(
id = c(rep(1, 3), rep(2, 3)),
name = c(rep("A", 3), rep("B", 3)),
country = c(rep("AU", 3), rep("BR", 3)),
year = rep(2020, 3),
quarter = c(1, 2, 4),
data = runif(3)
)
expose(df)
#> # A tibble: 8 x 6
#> id year quarter name country data
#> <dbl> <dbl> <dbl> <fct> <fct> <dbl>
#> 1 1 2020 1 A AU 0.830
#> 2 1 2020 2 A AU 0.642
#> 3 1 2020 3 <NA> <NA> NA
#> 4 1 2020 4 A AU 0.519
#> 5 2 2020 1 B BR 0.830
#> 6 2 2020 2 B BR 0.642
#> 7 2 2020 3 <NA> <NA> NA
#> 8 2 2020 4 B BR 0.519
To avoid this, I need to add those columns to the nesting
call.
If it were only one column, I could add an argument to the function for the column's name, which I'd then use with nesting(..., .data[[colname]])
. However, the .data
pronoun doesn't work with vectors (.data[c("name1", "name2")]
doesn't work).
So, how can I add multiple variable columns to the nesting
call?
Upvotes: 3
Views: 705
Reputation: 3071
The rlang
library includes the very powerful list2
object which allows for splicing via the "big bang" operator (!!!
). This allows us to pass nesting
a single object which is interpreted by the receiving function as a series of arguments.
Therefore, we can add a dots argument to the function which receives all the other columns to nest with, transform the names into symbols, and then pass that to nesting
:
suppressPackageStartupMessages({
library(dplyr)
library(rlang)
library(tidyr)
})
set.seed(42)
expose <- function(df, ...) {
x <- list2(...)
x <- lapply(x, sym)
complete(df, nesting(id, year, !!!x), quarter = 1:4)
}
df <- data.frame(
id = c(rep(1, 3), rep(2, 3)),
name = c(rep("A", 3), rep("B", 3)),
country = c(rep("AU", 3), rep("BR", 3)),
year = rep(2020, 3),
quarter = c(1, 2, 4),
data = runif(3)
)
expose(df, "name", "country")
#> # A tibble: 8 x 6
#> id year name country quarter data
#> <dbl> <dbl> <fct> <fct> <dbl> <dbl>
#> 1 1 2020 A AU 1 0.915
#> 2 1 2020 A AU 2 0.937
#> 3 1 2020 A AU 3 NA
#> 4 1 2020 A AU 4 0.286
#> 5 2 2020 B BR 1 0.915
#> 6 2 2020 B BR 2 0.937
#> 7 2 2020 B BR 3 NA
#> 8 2 2020 B BR 4 0.286
Created on 2020-03-02 by the reprex package (v0.3.0)
Upvotes: 0
Reputation: 6921
If you look at tidyr::nesting
, you'll see it relies on tidyr:::dots_cols
which relies on rlang
to interpret the column names (in particular, rlang::enquos
).
The best way to interface with tidyr::nesting
is therefore to use an rlang
construct.
library(dplyr)
library(tidyr)
expose <- function(df, ...) {
dots <- rlang::exprs(id, year, ...)
complete(df, nesting(!!! dots), quarter = 1:4)
}
df <- data.frame(
id = c(rep(1, 3), rep(2, 3)),
name = c(rep("A", 3), rep("B", 3)),
country = c(rep("AU", 3), rep("BR", 3)),
year = rep(2020, 3),
quarter = c(1, 2, 4),
data = runif(3)
)
expose(df)
#> # A tibble: 8 x 6
#> id year quarter name country data
#> <dbl> <dbl> <dbl> <fct> <fct> <dbl>
#> 1 1 2020 1 A AU 0.0417
#> 2 1 2020 2 A AU 0.365
#> 3 1 2020 3 <NA> <NA> NA
#> 4 1 2020 4 A AU 0.690
#> 5 2 2020 1 B BR 0.0417
#> 6 2 2020 2 B BR 0.365
#> 7 2 2020 3 <NA> <NA> NA
#> 8 2 2020 4 B BR 0.690
expose(df, name)
#> # A tibble: 8 x 6
#> id year name quarter country data
#> <dbl> <dbl> <fct> <dbl> <fct> <dbl>
#> 1 1 2020 A 1 AU 0.0417
#> 2 1 2020 A 2 AU 0.365
#> 3 1 2020 A 3 <NA> NA
#> 4 1 2020 A 4 AU 0.690
#> 5 2 2020 B 1 BR 0.0417
#> 6 2 2020 B 2 BR 0.365
#> 7 2 2020 B 3 <NA> NA
#> 8 2 2020 B 4 BR 0.690
expose(df, name, country)
#> # A tibble: 8 x 6
#> id year name country quarter data
#> <dbl> <dbl> <fct> <fct> <dbl> <dbl>
#> 1 1 2020 A AU 1 0.0417
#> 2 1 2020 A AU 2 0.365
#> 3 1 2020 A AU 3 NA
#> 4 1 2020 A AU 4 0.690
#> 5 2 2020 B BR 1 0.0417
#> 6 2 2020 B BR 2 0.365
#> 7 2 2020 B BR 3 NA
#> 8 2 2020 B BR 4 0.690
Created on 2020-03-02 by the reprex package (v0.3.0)
Upvotes: 2