Reputation: 312
This is the edited version of the question.
I need help to convert my wide data to long format data using the pivot_longer()
function in R. The main problem is wanting to create long data with a variable nested in another variable.
For example, if I have wide data like this, where
fu1
and fu2
are variables for the follow-up (in days). There are two follow-up events (fu1
and fu2
)cpass
and is
are the results of two tests at each follow upIDno <- c(1,2)
Sex <- c("M","F")
fu1 <- c(13,15)
fu2 <- c(20,18)
cpass1 <- c(27, 85)
cpass2 <- c(33, 90)
is1 <- c(201, 400)
is2 <- c(220, 430)
mydata <- data.frame(IDno, Sex,
fu1, cpass1, is1,
fu2, cpass2, is2)
mydata
which looks like this
And now, I want to convert it to long format data, and it should look like this:
I have tried the codes below, but they do not produce the data frame in the format that I want:
#renaming variables
mydata_wide <- mydata %>%
rename(fu1_day = fu1,
cp_one = cpass1,
is_one = is1,
fu2_day = fu2,
cp_two = cpass2,
is_two = is2)
#pivoting
mydata_wide %>%
pivot_longer(
cols = c(fu1_day, fu2_day),
names_to = c("fu", ".value"),
values_to = "day",
names_sep = "_") %>%
pivot_longer(
cols = c("cp_one", "is_one", "cp_two", "is_two"),
names_to = c("test", ".value"),
values_to = "value",
names_sep = "_")
The data frame, unfortunately, looks like this:
I have looked at some tutorials but have not found the best solution for this problem. Any help is very much appreciated.
Upvotes: 0
Views: 3906
Reputation: 312
I took inspiration from almost similar post from How to reshape Panel / Longitudinal survey data from wide to long format using pivot_longer and from the solution provided by RobertoT and put together these codes:
STEP 1: Generate wide data for simulation
IDno <- c(1,2)
Sex <- c("M","F")
fu1_day <- c(13,15)
fu2_day <- c(20,18)
fu1_cpass <- c(27, 85)
fu2_cpass <- c(33, 90)
fu1_is <- c(201, 400)
fu2_is <- c(220, 430)
mydata_wide <- data.frame(IDno, Sex,
fu1_day, fu1_cpass, fu1_is,
fu2_day, fu2_cpass, fu2_is)
mydata_wide
STEP 1: CONVERT TO LONG DATA (out1
)
out1 <- mydata_wide %>%
select(IDno, contains("day")) %>%
pivot_longer(cols = c(fu1_day, fu2_day),
names_to = c('fu', '.value'),
names_sep="_")
out1
STEP 2: CREATE ANOTHER LONG DATA AND JOIN WITH out1
mydata_wide %>%
select(-contains('day')) %>%
pivot_longer(cols = -c(IDno, Sex),
names_to = c('fu', 'test'),
names_sep="_") %>%
left_join(out1)
The result looks like this
Upvotes: 1
Reputation: 1683
library(tidyverse)
mydata %>% # the "nested" pivoting must be done within two calls
pivot_longer(cols=c(fu1,fu2),names_to = 'fu', values_to = 'day') %>%
pivot_longer(cols=c(starts_with('cpass'), starts_with('is')),
names_to = 'test', values_to = 'value') %>%
# with this filter check not mixing the tests and the follow-ups
filter(str_extract(fu,"\\d") == str_extract(test,"\\d")) %>%
mutate(test = gsub("\\d","",test)) # remove numbers in strings
Output:
# A tibble: 8 × 6
IDno Sex fu day test value
<dbl> <chr> <chr> <dbl> <chr> <dbl>
1 1 M fu1 13 cpass 27
2 1 M fu1 13 is 201
3 1 M fu2 20 cpass 33
4 1 M fu2 20 is 220
5 2 F fu1 15 cpass 85
6 2 F fu1 15 is 400
7 2 F fu2 18 cpass 90
8 2 F fu2 18 is 430
I'm not sure if your example is your real expected output, the first dataset and the output example that you describe do not show the same information.
Upvotes: 2