Display name
Display name

Reputation: 4481

Unable to use tidyselect `everything()` in combination with `group_by()` and `fill()`

library(tidyverse)
df <- tibble(x1 = c("A", "A", "A", "B", "B", "B"),
             x2 = c(NA, 8, NA, NA, NA, 5),
             x3 = c(3, 6, 5, 9, 1, 9))
#> # A tibble: 6 x 3
#>   x1       x2    x3
#>   <chr> <dbl> <dbl>
#> 1 A        NA     3
#> 2 A         8    NA
#> 3 A        NA     5
#> 4 B        NA     9
#> 5 B        NA     1
#> 6 B         5     9

I have groups 'A' and 'B' shown in column x1. I need the 'NA' values in columns x2 and x3 to populate only from values within the same group, in the updown direction. That's simple enough, here's the code:

df %>% group_by(x1) %>% fill(c(x2, x3), .direction = "updown")
#> # A tibble: 6 x 3
#>   x1       x2    x3
#>   <chr> <dbl> <dbl>
#> 1 A         8     3
#> 2 A         8     5
#> 3 A         8     5
#> 4 B         5     9
#> 5 B         5     1
#> 6 B         5     9

My real-life issue is that my data frame doesn't contain just columns x1 through x3. It's more like x1 through x100. And the column names are very random, in no logical order. To save myself the trouble of typing all ~100 columns in I tried the tidyselect everything() argument shown below. But that yields an understandable error. I don't know how to work around it.

df %>% group_by(x1) %>% fill(everything(), .direction = "updown")
#> Error: Column `x1` can't be modified because it's a grouping variable

I asked a related question yesterday, about naming exceptions to the everything() argument, was too simple in my approach, and as a consequence caused confusion on the intent on what I wanted to see in a solution. The proposed solution, "you can use select(-variable)", won't work in my case outlined above (I believe). Hence, this new question. What do I do?

I should also mention that simply selecting the numerical column sequence (ie 2:100) won't work because I need to cherry pick some columns out by name (eg x45, x70). And the order of the columns can change month to month, I have to cherry pick by column name. So using everything() with the option of everything_but(column.names = c(x45, x70)) would be what I really want. Does it exist?

Upvotes: 2

Views: 517

Answers (1)

tmfmnk
tmfmnk

Reputation: 39858

You can do:

df %>%
 group_by(x1) %>%
 fill(-x1, .direction = "updown")

  x1       x2    x3
  <chr> <dbl> <dbl>
1 A         8     3
2 A         8     6
3 A         8     5
4 B         5     9
5 B         5     1
6 B         5     9

This behavior is documented in the documentation of tidyr (also look at the comment from @Gregor):

You can supply bare variable names, select all variables between x and z with x:z, exclude y with -y.

Upvotes: 3

Related Questions