Reputation: 291
I have a df that is will be of nonfinite length. Example below only has 2 traits: "density" and "lipids", but other dfs may have 50 or more traits. Each trait has 3 columns associated with it: value.trait, unit.trait, method.trait. Seems very similiar to this example in vignette But when I run the code below I keep getting an error: Input must be a vector, not NULL
3 rows of sample data
x <- structure(list(geno_name = c("MB mixed", "MB mixed", "MB mixed"
), study_location = c("lab", "lab", "lab"), author = c("test",
"test", "test"), value.lipids = c(NA, 2.361463603, 1.461189384
), unit.lipids = c(NA, "g cm^-2", "g cm^-2"), method.lipids = c(NA,
"airbrush", "airbrush"), value.density = c(1.125257337, 0.816034359,
0.49559013), unit.density = c("g cm^-3", "g cm^-3", "g cm^-3"
), method.density = c("3D scanning", "3D scanning", "3D scanning"
)), row.names = c(NA, 3L), class = "data.frame")
Current pivot code:
x %>%
select(!c(study_location, author)) %>%
pivot_longer(cols = !geno_name,
names_to = c(".value", "trait"),
names_sep = ".",
values_drop_na = TRUE)
Error code:
Error: Input must be a vector, not NULL. Run
rlang::last_error()
to see where the error occurred. In addition: Warning messages: 1: In gsub(paste0("^", names_prefix), "", names(cols)) : argument 'pattern' has length > 1 and only the first element will be used 2: Expected 2 pieces. Additional pieces discarded in 6 rows [1, 2, 3, 4, 5, 6].
Upvotes: 4
Views: 1610
Reputation: 887501
We can also do
tidyr::pivot_longer(x,
cols = c(lipids, density),
names_to = c('.value', 'trait'),
names_sep = '[.]',
values_drop_na = TRUE)
Upvotes: 2
Reputation: 389135
You can use pivot_longer
as :
tidyr::pivot_longer(x,
cols = matches('lipids|density'),
names_to = c('.value', 'trait'),
names_sep = '\\.',
values_drop_na = TRUE)
# geno_name study_location author trait value unit method
# <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
#1 MB mixed lab test density 1.13 g cm^-3 3D scanning
#2 MB mixed lab test lipids 2.36 g cm^-2 airbrush
#3 MB mixed lab test density 0.816 g cm^-3 3D scanning
#4 MB mixed lab test lipids 1.46 g cm^-2 airbrush
#5 MB mixed lab test density 0.496 g cm^-3 3D scanning
Upvotes: 1
Reputation: 66765
Here's an approach that first makes the data longer, then splits out traits from unit/method, then spreads those.
x %>%
janitor::clean_names() %>% # This makes the column names distinct with #s
pivot_longer(cols = -(1:2),
names_to = "var",
values_to = "val",
values_transform = list(val = as.character)) %>%
mutate(trait = if_else(str_detect(var, "unit|method", negate = TRUE),
var, NA_character_),
# the regex below is meant to remove everything starting with _
stat = if_else(is.na(trait), var %>% str_remove("\\_[^.]*$"), "value")) %>%
fill(trait) %>%
select(-var) %>%
pivot_wider(names_from = stat, values_from = val)
# A tibble: 2 x 6
geno_name observation_id trait value unit method
<chr> <dbl> <chr> <chr> <chr> <chr>
1 MB mixed 10 lipids NA NA NA
2 MB mixed 10 density 1.125 g cm^-3 3D scanning
Upvotes: 1