Aslan
Aslan

Reputation: 11

Pivoting Longer in R to create more than one value column

First of all let me apologise if any of this is super basic or unclear - I'm very new to R, and struggling with something which seems as though it should be a very thing.

I have a fairly large dataset (data frame read in from a csv), and at the moment each individual is represented by a single row, with multiple observations (the columns). A simplified version would look something like this:

id <- c("1", "2", "3", "4")
hrar1 <- c(21, 22, 22, 41)
hrar2 <- c(24, 32, 33, 42)
hrar3 <- c(27, 35, 39, 51)
pai1 <- c(103, 103, 103, 103)
pai2 <- c(115, 115, 115, 115)
pai3 <- c(127, 127, 127, 127)

df <- data.frame(id, hrar1, hrar2, hrar3, pai1, pai2, pai3)

I am trying to correlate the change in hrar value with the increase in pai value. So I believe I need to reorganise the data to be in long format. Looking something more like this:

id2 <- c("1", "1", "1",  
         "2", "2", "2",  
         "3", "3", "3",  
         "4", "4", "4")
hrar <- c(21, 24, 27,
        22, 32, 35,
        22, 33, 39,
        41, 42, 51)
pai <- c(103, 115, 127,
         103, 115, 127,
         103, 115, 127,
         103, 115, 127)

df2 <- data.frame(id2, pai, hrar)

So all the values from the 'paixxx' columns in the original data frame have been gathered and the corresponding 'hrarxxx' value is next to it.

But I can't for the life of me figure out how to use the pivot_longer() function to do this. The new column (what I think is the 'names()' column), is not in the existing names of the data frame as per the examples I can find online, but is instead in the actual cell.

Any help would be much appreciated.

I have tried lots of different things, all using the pivot_longer() function, all of which throw errors... I have been using the online resources I can find, but don't seem to be able to find anything which helps me understand where I am going wrong.

Upvotes: 1

Views: 49

Answers (1)

stefan
stefan

Reputation: 124463

Using the special ".value" and the names_pattern argument of pivot_longer you could do:

library(tidyr)

df |> 
  pivot_longer(-id, names_to = c(".value", "id2"), names_pattern = "(.*?)(\\d+)")
#> # A tibble: 12 × 4
#>    id    id2    hrar   pai
#>    <chr> <chr> <dbl> <dbl>
#>  1 1     1        21   103
#>  2 1     2        24   115
#>  3 1     3        27   127
#>  4 2     1        22   103
#>  5 2     2        32   115
#>  6 2     3        35   127
#>  7 3     1        22   103
#>  8 3     2        33   115
#>  9 3     3        39   127
#> 10 4     1        41   103
#> 11 4     2        42   115
#> 12 4     3        51   127

EDIT

What we want to achieve is to reshape the dataset such that we end up with one column hrar containing the values of hrar1 to hrar3 and one column pai containing the values of pai1 to pai3.

First using ".value" we tell pivot_longer that we want multiple value columns and one name column (which I called id2). Or put differently we want to split the names of the columns we pivot in two parts. Using c(".value", "id2") we tell pivot_longer that the first part should become the name of the value column(s) and the second part the category of the name column.

Second we have to tell pivot_longer how to split the column names into two parts. That is done via the names_pattern argument using regular expressions. Here the () are used to define the two parts. The (\\d+) means that the second part consists only of digits. The (.?*) means that the first group consists of all characters to the left of the digits, e.g. using these patterns hrar3 gets splitted into hrar and 3.

Finally, the id2 column contains the information on the value's origin, i.e. whether the hrar value is from hrar2 or hrar3. If this is not needed you can drop that column or assign a more suitable name than id2.

Upvotes: 1

Related Questions