Patrick
Patrick

Reputation: 291

Use pivot_longer to cast data to long with repeated column names

I have a df that is will be of nonfinite length. Example below only has 2 traits: "density" and "lipids", but other dfs may have 50 or more traits. Each trait has 3 columns associated with it: value.trait, unit.trait, method.trait. Seems very similiar to this example in vignette But when I run the code below I keep getting an error: Input must be a vector, not NULL

3 rows of sample data

 x <- structure(list(geno_name = c("MB mixed", "MB mixed", "MB mixed"
), study_location = c("lab", "lab", "lab"), author = c("test", 
"test", "test"), value.lipids = c(NA, 2.361463603, 1.461189384
), unit.lipids = c(NA, "g cm^-2", "g cm^-2"), method.lipids = c(NA, 
"airbrush", "airbrush"), value.density = c(1.125257337, 0.816034359, 
0.49559013), unit.density = c("g cm^-3", "g cm^-3", "g cm^-3"
), method.density = c("3D scanning", "3D scanning", "3D scanning"
)), row.names = c(NA, 3L), class = "data.frame")

Current pivot code:

x %>%
  select(!c(study_location, author)) %>%
  pivot_longer(cols = !geno_name,
               names_to = c(".value", "trait"),
               names_sep = ".",
               values_drop_na = TRUE)

Error code:

Error: Input must be a vector, not NULL. Run rlang::last_error() to see where the error occurred. In addition: Warning messages: 1: In gsub(paste0("^", names_prefix), "", names(cols)) : argument 'pattern' has length > 1 and only the first element will be used 2: Expected 2 pieces. Additional pieces discarded in 6 rows [1, 2, 3, 4, 5, 6].

Upvotes: 4

Views: 1610

Answers (3)

akrun
akrun

Reputation: 887501

We can also do

tidyr::pivot_longer(x, 
         cols = c(lipids, density), 
         names_to = c('.value', 'trait'), 
         names_sep = '[.]', 
         values_drop_na = TRUE)

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389135

You can use pivot_longer as :

tidyr::pivot_longer(x, 
             cols = matches('lipids|density'), 
             names_to = c('.value', 'trait'), 
             names_sep = '\\.', 
             values_drop_na = TRUE)

#  geno_name study_location author trait   value unit    method     
#  <chr>     <chr>          <chr>  <chr>   <dbl> <chr>   <chr>      
#1 MB mixed  lab            test   density 1.13  g cm^-3 3D scanning
#2 MB mixed  lab            test   lipids  2.36  g cm^-2 airbrush   
#3 MB mixed  lab            test   density 0.816 g cm^-3 3D scanning
#4 MB mixed  lab            test   lipids  1.46  g cm^-2 airbrush   
#5 MB mixed  lab            test   density 0.496 g cm^-3 3D scanning

Upvotes: 1

Jon Spring
Jon Spring

Reputation: 66765

Here's an approach that first makes the data longer, then splits out traits from unit/method, then spreads those.

x %>% 
  janitor::clean_names() %>%    # This makes the column names distinct with #s
  pivot_longer(cols = -(1:2),
               names_to = "var",
               values_to = "val", 
               values_transform = list(val = as.character)) %>%
  mutate(trait = if_else(str_detect(var, "unit|method", negate = TRUE),
                         var, NA_character_),
         # the regex below is meant to remove everything starting with _
         stat = if_else(is.na(trait), var %>% str_remove("\\_[^.]*$"), "value")) %>%
  fill(trait) %>%
  select(-var) %>%
  pivot_wider(names_from = stat, values_from = val)

# A tibble: 2 x 6
  geno_name observation_id trait   value  unit    method     
  <chr>              <dbl> <chr>   <chr>  <chr>   <chr>      
1 MB mixed              10 lipids  NA     NA      NA         
2 MB mixed              10 density 1.125  g cm^-3 3D scanning

Upvotes: 1

Related Questions