Reputation: 11
First of all let me apologise if any of this is super basic or unclear - I'm very new to R, and struggling with something which seems as though it should be a very thing.
I have a fairly large dataset (data frame read in from a csv), and at the moment each individual is represented by a single row, with multiple observations (the columns). A simplified version would look something like this:
id <- c("1", "2", "3", "4")
hrar1 <- c(21, 22, 22, 41)
hrar2 <- c(24, 32, 33, 42)
hrar3 <- c(27, 35, 39, 51)
pai1 <- c(103, 103, 103, 103)
pai2 <- c(115, 115, 115, 115)
pai3 <- c(127, 127, 127, 127)
df <- data.frame(id, hrar1, hrar2, hrar3, pai1, pai2, pai3)
I am trying to correlate the change in hrar value with the increase in pai value. So I believe I need to reorganise the data to be in long format. Looking something more like this:
id2 <- c("1", "1", "1",
"2", "2", "2",
"3", "3", "3",
"4", "4", "4")
hrar <- c(21, 24, 27,
22, 32, 35,
22, 33, 39,
41, 42, 51)
pai <- c(103, 115, 127,
103, 115, 127,
103, 115, 127,
103, 115, 127)
df2 <- data.frame(id2, pai, hrar)
So all the values from the 'paixxx' columns in the original data frame have been gathered and the corresponding 'hrarxxx' value is next to it.
But I can't for the life of me figure out how to use the pivot_longer() function to do this. The new column (what I think is the 'names()' column), is not in the existing names of the data frame as per the examples I can find online, but is instead in the actual cell.
Any help would be much appreciated.
I have tried lots of different things, all using the pivot_longer() function, all of which throw errors... I have been using the online resources I can find, but don't seem to be able to find anything which helps me understand where I am going wrong.
Upvotes: 1
Views: 49
Reputation: 124463
Using the special ".value"
and the names_pattern
argument of pivot_longer
you could do:
library(tidyr)
df |>
pivot_longer(-id, names_to = c(".value", "id2"), names_pattern = "(.*?)(\\d+)")
#> # A tibble: 12 × 4
#> id id2 hrar pai
#> <chr> <chr> <dbl> <dbl>
#> 1 1 1 21 103
#> 2 1 2 24 115
#> 3 1 3 27 127
#> 4 2 1 22 103
#> 5 2 2 32 115
#> 6 2 3 35 127
#> 7 3 1 22 103
#> 8 3 2 33 115
#> 9 3 3 39 127
#> 10 4 1 41 103
#> 11 4 2 42 115
#> 12 4 3 51 127
EDIT
What we want to achieve is to reshape the dataset such that we end up with one column hrar
containing the values of hrar1
to hrar3
and one column pai
containing the values of pai1
to pai3
.
First using ".value"
we tell pivot_longer
that we want multiple value columns and one name
column (which I called id2
). Or put differently we want to split the names of the columns we pivot in two parts. Using c(".value", "id2")
we tell pivot_longer
that the first part should become the name of the value column(s) and the second part the category of the name column.
Second we have to tell pivot_longer
how to split the column names into two parts. That is done via the names_pattern
argument using regular expressions. Here the ()
are used to define the two parts. The (\\d+)
means that the second part consists only of digits. The (.?*)
means that the first group consists of all characters to the left of the digits, e.g. using these patterns hrar3
gets splitted into hrar
and 3
.
Finally, the id2
column contains the information on the value's origin, i.e. whether the hrar
value is from hrar2
or hrar3
. If this is not needed you can drop that column or assign a more suitable name than id2
.
Upvotes: 1