Reputation: 477
I have a trouble to pivot different vars in one dataset. My original dataset looks like:
childid<- c(m1, m2,m3,m4,m5)
begin.age<-c(12,3,NA,4, 5)
end.age<-c(14,5,6,NA, 7)
begin.weight<- c(1,3,NA,4, 5)
end.weight <-c(4,5,7,NA, 8)
phase<-c(1,2,3,4,5)
df<- data.frame(childid, begin.age, end.age, begin.weight, end.weight,phase)
now, I am trying to use pivot_longer function to pivot age and weight, the ideal output should be:
childid timepoint age weight phase
m1 begin 12 1 1
m1 end 14 4 1
m2 begin 3 3 3
m2 end 5 5 3
m3 begin NA NA 4
m3 end 6 7 4
.....
i tried to split the dataset into two subdatasets, and use piovt_longer to convert each of variable (age, weight), but i cannot use left_join to merge them back successfully. the merged dataset is multiple times than it should be. I assume bc the data was collected repeatedly and there is no unique merge key for both dataset.
df_a<-df%>%
pivot_longer (
cols=c(`begin.age`,`end.age`)
names_to="timepoint",
values_to="age"
df_w<-df%>%
pivot_longer (
cols=c(`begin.weight`,`end.weight`)
names_to="timepoint",
values_to="weight"
)
df_p<- df_a%>%
left_join (df_w, by="childid")
Is there any other better way to achieve this goal without splitting the datasets in R? Thanks a lot~~!
Upvotes: 2
Views: 147
Reputation: 78927
Alternative approach: Similar to akrun's approach but using names_sep
instead of names_pattern
:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(
cols = -c(childid, phase),
names_to = c("timepoint", ".value"),
names_sep = "\\.",
values_drop_na = TRUE
)
childid phase timepoint age weight
<chr> <dbl> <chr> <dbl> <dbl>
1 m1 1 begin 12 1
2 m1 1 end 14 4
3 m2 2 begin 3 3
4 m2 2 end 5 5
5 m3 3 end 6 7
6 m4 4 begin 4 4
7 m5 5 begin 5 5
8 m5 5 end 7 8
Upvotes: 2
Reputation: 887118
Perhaps this helps
library(tidyr)
pivot_longer(df, cols = matches('^(begin|end)'),
names_to = c("timepoint", ".value"),
names_pattern = "(.*)\\.(\\w+)$", values_drop_na = TRUE)
df <- structure(list(childid = c("m1", "m2", "m3", NA, "m4"), begin.age = c(12,
3, NA, 4, 5), end.age = c(14, 5, 6, NA, 7), begin.weight = c(1,
3, NA, 4, 5), end.weight = c(4, 5, 7, NA, 8), phase = c(1, 2,
3, 4, 5)), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 2