KIM
KIM

Reputation: 312

Using pivot_longer from tidyr to create a long format data with one variable nested in another variable

This is the edited version of the question.

I need help to convert my wide data to long format data using the pivot_longer() function in R. The main problem is wanting to create long data with a variable nested in another variable.

For example, if I have wide data like this, where

IDno <- c(1,2)
Sex <- c("M","F")
fu1 <- c(13,15)
fu2 <- c(20,18)
cpass1 <- c(27, 85)
cpass2 <- c(33, 90)
is1 <- c(201, 400)
is2 <- c(220, 430)
mydata <- data.frame(IDno, Sex, 
                     fu1, cpass1, is1, 
                     fu2, cpass2, is2)
mydata

which looks like this

enter image description here

And now, I want to convert it to long format data, and it should look like this:

enter image description here

I have tried the codes below, but they do not produce the data frame in the format that I want:

#renaming variables
mydata_wide <- mydata %>%
  rename(fu1_day = fu1,
         cp_one = cpass1,
         is_one = is1,
         fu2_day = fu2,
         cp_two = cpass2,
         is_two = is2)

#pivoting
mydata_wide %>%
  pivot_longer(
    cols = c(fu1_day, fu2_day),
    names_to = c("fu", ".value"),
    values_to = "day",
    names_sep = "_") %>% 
  pivot_longer(
    cols = c("cp_one", "is_one", "cp_two", "is_two"),
    names_to = c("test", ".value"),
    values_to = "value", 
    names_sep = "_") 

The data frame, unfortunately, looks like this:

enter image description here

I have looked at some tutorials but have not found the best solution for this problem. Any help is very much appreciated.

Upvotes: 0

Views: 3906

Answers (2)

KIM
KIM

Reputation: 312

I took inspiration from almost similar post from How to reshape Panel / Longitudinal survey data from wide to long format using pivot_longer and from the solution provided by RobertoT and put together these codes:

STEP 1: Generate wide data for simulation

IDno <- c(1,2)
Sex <- c("M","F")
fu1_day <- c(13,15)
fu2_day <- c(20,18)
fu1_cpass <- c(27, 85)
fu2_cpass <- c(33, 90)
fu1_is <- c(201, 400)
fu2_is <- c(220, 430)
mydata_wide <- data.frame(IDno, Sex, 
                     fu1_day, fu1_cpass, fu1_is, 
                     fu2_day, fu2_cpass, fu2_is)
mydata_wide

enter image description here

STEP 1: CONVERT TO LONG DATA (out1)

out1 <- mydata_wide %>% 
  select(IDno, contains("day")) %>%
  pivot_longer(cols = c(fu1_day, fu2_day),
               names_to = c('fu', '.value'), 
               names_sep="_") 
out1

enter image description here

STEP 2: CREATE ANOTHER LONG DATA AND JOIN WITH out1

mydata_wide %>% 
  select(-contains('day')) %>% 
  pivot_longer(cols = -c(IDno, Sex),
               names_to = c('fu', 'test'),
               names_sep="_") %>%
  left_join(out1) 

The result looks like this

enter image description here

Upvotes: 1

RobertoT
RobertoT

Reputation: 1683

library(tidyverse)
mydata %>% # the "nested" pivoting must be done within two calls
  pivot_longer(cols=c(fu1,fu2),names_to = 'fu', values_to = 'day') %>% 
  pivot_longer(cols=c(starts_with('cpass'), starts_with('is')), 
    names_to = 'test', values_to = 'value') %>%
  # with this filter check not mixing the tests and the follow-ups
  filter(str_extract(fu,"\\d") == str_extract(test,"\\d")) %>% 
  mutate(test = gsub("\\d","",test)) # remove numbers in strings

Output:

# A tibble: 8 × 6
   IDno Sex   fu      day test  value
  <dbl> <chr> <chr> <dbl> <chr> <dbl>
1     1 M     fu1      13 cpass    27
2     1 M     fu1      13 is      201
3     1 M     fu2      20 cpass    33
4     1 M     fu2      20 is      220
5     2 F     fu1      15 cpass    85
6     2 F     fu1      15 is      400
7     2 F     fu2      18 cpass    90
8     2 F     fu2      18 is      430

I'm not sure if your example is your real expected output, the first dataset and the output example that you describe do not show the same information.

Upvotes: 2

Related Questions