Reputation: 4242
I have a data from where the data are stored in a "wide" structure such that the values of multiple observations on a set of variables are stored across multiple columns in a single row. I'm trying to convert my data to a long structure using tidyr::pivot_longer()
. However, I receive the error "Failed to create output due to bad names." because one of the columns in the data frame I am passing to the pivoting function is identical to at least one of the column names pivot_longer()
wants to create based on passing ".value"
to the names_to
argument.
While this error avoids bad names and one option is to change the names in the data I am passing to pivot_longer()
, I am trying to find a way to avoid this through use of the function itself. The repair names argument can be used to add numerical suffixes to the end of the names to avoid duplicate column names, but I am trying to add my own string(s) instead of the suffixes.
Specifically, I wondering if there is a way to use the names_to
argument to create column names that avoid the names error while still using ".value"
. The motivation for doing so is to avoid passing a vector of column names to names_to
. Alternatively, this may be a case where using pivot_longer_spec
may be appropriate, however, I'm not sure how to use this function in conjunction with ".value"
.
Minimal working example found below:
library(tidyr)
library(dplyr)
# Create example data
dat <- data.frame(
foo_1a = 1:3,
foo_1b = 1:3,
foo_2a = 1:3,
foo_2b = 1:3,
bar_1a = 1:3,
bar_1b = 1:3,
bar_2a = 1:3,
bar_2b = 1:3,
cat = c("a","b","c"),
dog = c("d","e","f")
)
# No error
dat %>% tidyr::pivot_longer(
cols = ends_with(c("1a", "1b", "2a", "2b")),
names_to = c(".value", "profile"),
names_sep = "_"
)
# Add another variable that causes duplicate names
# when pivoted due to column name prefix
dat_fail <- dat %>% mutate(foo = 4:6)
# "Error: Failed to Create output due to bad names"
# because the function tries to create foo when it's
# already in the data.
dat_fail %>% tidyr::pivot_longer(
cols = ends_with(c("1a", "1b", "2a", "2b")),
names_to = c(".value", "profile"),
names_sep = "_"
)
# Attempt to fix #1: doesn't produce error
# but fails because it does not create columns
# foo and bar and instead places foo and bar
# in the .valuefiller column.
dat_fail %>% tidyr::pivot_longer(
cols = ends_with(c("1a", "1b", "2a", "2b")),
names_to = c(paste0(".value", "filler"), "profile"),
names_sep = "_"
)
# Attempt to fix #2: try passing "unique" to
# repair argument, but doesn't work. Even so,
# this would append numeric suffixes when
# I want to be able to specify the suffix myself.
# Not sure if this is a bug.
dat_fail %>% tidyr::pivot_longer_spec(
cols = ends_with(c("1a", "1b", "2a", "2b")),
names_to = c(".value", "profile"),
names_sep = "_",
names_repair = "unique"
)
# Error in tidyr::pivot_longer_spec(., cols = ends_with(c("1a", "1b", "2a", :
# unused arguments ( cols = ends_with(c("1a", "1b", "2a", "2b")),
# names_to = c(".value", "profile"), names_sep = "_")
# Desired output
# Create example data
dat <- data.frame(
cat = c("a","a","a","a","b","b","b","b","c","c","c","c")
dog = c("d","d","d","d","e","e","e","e","f","f","f","f")
foo = c(1,1,1,1,2,2,2,2,3,3,3,3),
profile = rep(c("1a","1b","2a","2b"), 3),
foo_suffix = c(4,4,4,4,5,5,5,5,6,6,6,6)
)
Upvotes: 1
Views: 754
Reputation: 13680
names_repair
can accept a function taking the column names as input.
We can use that to build the result you want. The following is just an example and probably not a good one, but you can use or write the function that better adapts to your use case:
library(tidyr)
library(dplyr)
# Create example data
dat <- data.frame(
foo_1a = 1:3,
foo_1b = 1:3,
foo_2a = 1:3,
foo_2b = 1:3,
bar_1a = 1:3,
bar_1b = 1:3,
bar_2a = 1:3,
bar_2b = 1:3,
cat = c("a","b","c"),
dog = c("d","e","f")
)
dat_fail <- dat %>% mutate(foo = 4:6)
dat_fail %>%
pivot_longer(
cols = ends_with(c("1a", "1b", "2a", "2b")),
names_sep = '_',
names_to = c(".value", "profile"),
names_repair = ~ {
.x[duplicated(.x, fromLast = TRUE)] <- paste(.x[duplicated(.x, fromLast = TRUE)], 'suffix', sep = '_')
.x
}
)
#> New names:
#> * foo -> foo_suffix
#> # A tibble: 12 x 6
#> cat dog foo_suffix profile foo bar
#> <fct> <fct> <int> <chr> <int> <int>
#> 1 a d 4 1a 1 1
#> 2 a d 4 1b 1 1
#> 3 a d 4 2a 1 1
#> 4 a d 4 2b 1 1
#> 5 b e 5 1a 2 2
#> 6 b e 5 1b 2 2
#> 7 b e 5 2a 2 2
#> 8 b e 5 2b 2 2
#> 9 c f 6 1a 3 3
#> 10 c f 6 1b 3 3
#> 11 c f 6 2a 3 3
#> 12 c f 6 2b 3 3
Created on 2020-06-16 by the reprex package (v0.3.0)
Upvotes: 3