Matt
Matt

Reputation: 45

Repeated measures in messy format, need help to tidy

I have a very large data set containing weekly weights that have been coded with week of study and the weight at that visit. There are some missing visits and the data is not currently aligned.

df <- data.frame(ID=1:3, Week_A=c(6,6,7), Weight_A=c(23,24,23), Week_B=c(7,7,8), 
                 Weight_B=c(25,26,27), Week_C=c(8,9,9), Weight_C=c(27,26,28)) 

df
  ID Week_A Weight_A Week_B Weight_B Week_C Weight_C
1  1      6       23      7       25      8       27
2  2      6       24      7       26      9       26
3  3      7       23      8       27      9       28

I would like to align the data by week number (ideal output below).

df_ideal <- data.frame (ID=1:3, Week_6=c(23,24,NA), Week_7=c(25,26,23), 
                        Week_8=c(27,NA,27), Week_9=c(NA,26,28))

df_ideal
  ID Week_6 Week_7 Week_8 Week_9
1  1     23     25     27     NA
2  2     24     26     NA     26
3  3     NA     23     27     28

I would appreciate some help with this, even to find a starting point to manipulate this data to an easier to manage format.

Upvotes: 2

Views: 78

Answers (4)

LmW.
LmW.

Reputation: 1364

A tidyverse solution:

df <- data.frame(ID=1:3, 
                 Week_A=c(6,6,7), 
                 Weight_A=c(23,24,23), 
                 Week_B=c(7,7,8), 
                 Weight_B=c(25,26,27),
                 Week_C=c(8,9,9),
                 Weight_C=c(27,26,28))

library(tidyverse)
df_long <- df %>% gather(key="v", value="value", -ID) %>% 
  separate(v, into=c("v1", "v2")) %>% 
  spread(v1, value) %>% 
  complete(ID, Week) %>% 
  arrange(Week, ID)

df_long
# A tibble: 12 x 4
#      ID  Week v2    Weight
#   <int> <dbl> <chr>  <dbl>
# 1     1     6 A         23
# 2     2     6 A         24
# 3     3     6 <NA>      NA
# 4     1     7 B         25
# 5     2     7 B         26
# 6     3     7 A         23
# 7     1     8 C         27
# 8     2     8 <NA>      NA
# 9     3     8 B         27
#10     1     9 <NA>      NA
#11     2     9 C         26
#12     3     9 C         28

df_wide <- df_long %>% select(-v2) %>% 
  spread(Week, Weight, sep="_")
df_wide
# A tibble: 3 x 5
#     ID Week_6 Week_7 Week_8 Week_9
#  <int>  <dbl>  <dbl>  <dbl>  <dbl>
#1     1     23     25     27     NA
#2     2     24     26     NA     26
#3     3     NA     23     27     28

Personally, I'd keep using df_long instead of df_wide, as it is a tidy data frame, while df_wide is not.

Upvotes: 2

thelatemail
thelatemail

Reputation: 93908

In base R, it's a double reshape, firstly to long and then back to wide on a different variable:

tmp <- reshape(df, idvar="ID", varying=lapply(c("Week_","Weight_"), grep, names(df)),
               v.names=c("time","Week"), direction="long")
reshape(tmp, idvar="ID", direction="wide", sep="_")

#    ID Week_6 Week_7 Week_8 Week_9
#1.1  1     23     25     27     NA
#2.1  2     24     26     NA     26
#3.1  3     NA     23     27     28

Upvotes: 0

Maurits Evers
Maurits Evers

Reputation: 50718

Another tidyverse solution using a double-gather with a final spread

df %>%
    gather(k, v, -ID, -starts_with("Weight")) %>%
    separate(k, into = c("k1", "k2")) %>%
    unite(k1, k1, v) %>%
    gather(k, v, starts_with("Weight")) %>%
    separate(k, into = c("k3", "k4")) %>%
    filter(k2 == k4) %>%
    select(-k2, -k3, -k4) %>%
    spread(k1, v)
#  ID Week_6 Week_7 Week_8 Week_9
#1  1     23     25     27     NA
#2  2     24     26     NA     26
#3  3     NA     23     27     28

Upvotes: 1

chinsoon12
chinsoon12

Reputation: 25225

Here is a possible approach using the data.table package

library(data.table)
#convert into a data.table
setDT(df)

#convert into a long format
mdat <- melt(df, id.vars="ID", measure.vars=patterns("^Week", "^Weight", cols=names(df)))

#pivot into desired output
ans <- dcast(mdat, ID ~ value1, value.var="value2")

ans output:

   ID  6  7  8  9
1:  1 23 25 27 NA
2:  2 24 26 NA 26
3:  3 NA 23 27 28

And if you really need the "Week_" in your column names, you can use

setnames(ans, names(ans)[-1L], paste("Week_", names(ans)[-1L]))

Upvotes: 1

Related Questions