Peter
Peter

Reputation: 12719

How does the names_to '.value' convention work for multiple observations per row in pivot_longer?

A recent SO answer, shamelessly copied, used dplyr::pivot_longer to process 6 variables into three.

I can understand the logic for all the pivot_longer arguments except for the names to '.values' input.

I can work out what it does: it creates the new variable names based on the first bracketed regex expression in the names_pattern argument.

My question is how does '.values' work?

I can see it is used in the pivot_longer function examples section for "Multiple observations per row"; but no explanation is given in the example.

It feels as if it could be a regex option . means matches any character except \n; or is it a 'pronoun' type of output which seems to be common in the 'tidyverse' meaning something like 'the output or value of the regex expression'?

Any guidance or pointers where to find information on how to understand the intricacies of pivot_longer would be appreciated.

Or is it just a case of experimenting with the function and understanding what it does by doing?

Link to original question: [pivot longer with multiple columns and values

library(tibble)
library(tidyr)


tib <- tibble(type = c(1L, 1L, 1L, 2L, 2L, 2L), 
              id = c(1L, 2L, 3L, 1L, 2L, 3L), 
              age2000 = c(20L, 35L, 24L, 32L, 66L, 14L), 
              age2001 = c(21L, 36L, 25L, 33L, 67L, 15L),
              age2002 = c(22L, 37L, 26L, 34L, 68L, 16L),
              bool2000 = c(1L, 2L, 1L, 2L, 2L, 1L),
              bool2001 = c(1L, 2L, 1L, 2L, 2L, 1L),
              bool2002 = c(1L, 2L, 1L, 2L, 2L, 1L))




pivot_longer(tib,
             cols = -c(id, type), 
             names_to = c('.value', 'year'),
             names_pattern = '([a-z]+)(\\d+)')

Upvotes: 2

Views: 1239

Answers (1)

NelsonGon
NelsonGon

Reputation: 13319

From the source code, .value sets values_to to NULL such that it does not use the names in values_to but the names of the cell itself.

If you look at this line:

 if (".value" %in% names_to) {
    values_to <- NULL
  }

Then:

  out <- tibble(.name = cols)
  out[[".value"]] <- values_to
  out <- vec_cbind(out, names)
  out
}

out[[.value]] will select columns except id and type which can then be renamed with names_pattern. Since names are in the format age2000, the names_pattern breaks age2000 for instance to age and 2000 with the latter taking year while .value ensures the former keeps what comes out of the regex(age here).

Upvotes: 2

Related Questions