Transforming multible columns from wide to long format.

Question

First up - I don´t think this is a duplicate post. I have found several great posts and webpages on transforming multiple columns from wide to long format, but none of them are similar to my data since they deal with several columns that are measures of the same thing (in my example that would be A1, A2, A3, A4), and also they do not already contain a long format variable (frame in my case).

Here is my question:

I am working with a dataset containing many variabels measured by two different motion capture systems. Currently my dataset is in wide format, but I´ve realized ggplot works much better with long format, and I therefore wish to transform my data.

Here is a very simplified version of my data:

id <- (rep(1:3, each = 3))
frame <- (rep(1:3, 3))
A1 <- runif(9, min =1, max =100)
B1 <- runif(9, min =1, max =10)
C1 <- runif(9, min =-10, max =10)
A2 <- rnorm(9, mean = A1, sd=1)
B2 <- rnorm(9, mean = B1, sd=1)
C2 <- rnorm(9, mean = C1, sd=1)
df_wide <- as.data.frame.matrix(cbind(id, frame, A1, B1, C1, A2, B2, C2))
rm(id, frame, A1, A2, B1, B2, C1, C2)

df_wide$id <- as.factor(df_wide$id)
df_wide$frame <- as.factor(df_wide$frame)

head(df_wide)

  id frame        A1       B1        C1        A2       B2         C2
1  1     1 50.940395 4.141713 -1.294736 51.324398 4.271260  0.6174782
2  1     2 33.117691 5.044080  1.820367 32.977860 5.506677  0.8811504
3  1     3 50.000625 8.584148 -1.294245 50.603195 8.099262  0.6418580
4  2     1 61.675927 5.269216 -6.002856 61.996378 6.186417 -6.5428624
5  2     2  5.514353 6.570010  5.199728  4.798275 4.955662  5.1502535
6  2     3 51.580086 5.683788  9.831663 50.717459 5.430070 10.9601541

A1 and A2, B1 and B2, are measures of the same type of movements (A and B) made by system 1 and system 2. Each patient has been measured several times as indicated by the frame variable.

I would like my data.frame to look like this:

  id frame system        A        B         C
1  1     1      1 
2  1     1      2 
3  1     2      1 
4  1     2      2 
5  1     3      1 
6  1     3      2

I have two problems keeping me from solving this:

1) measurements between the two systems are not placed next to each other. Therefore I cant use code like this:

library(tidyr)
df_long <- gather(df_wide, System, A, A1:A2, factor_key=TRUE)

2) My dataset contains close to 120 variabels, so I would like a solution that wouldn't require me to write the code for each variable. I am considering making a loop to solve this, but any help in this regard will be much appreciated as well.

akuiper · Accepted Answer

The tidyr approach would be: 1) gather the measures columns, 2) separate the headers into movements (alpha) + system(numeric) using extract with regex, 3) spread movements to headers:

library(tidyr)

df_wide %>% 
    gather(keys, values, -id, -frame) %>% 
    extract(keys, c("movements", "system"), "([a-zA-Z]+)([0-9]+)") %>% 
    spread(movements, values)

#   id frame system         A        B         C
#1   1     1      1 62.175823 9.661748 -9.120404
#2   1     1      2 62.957358 9.229938 -8.814429
#3   1     2      1 22.463641 3.904546  4.059267
#4   1     2      2 22.798492 3.045190  4.663611
#5   1     3      1 13.897632 6.675986 -9.528184
#6   1     3      2 15.036539 6.964412 -8.920507
#7   2     1      1 38.765030 7.735174  8.373283
#8   2     1      2 40.124285 4.947368 10.143035
#9   2     2      1  5.924254 9.358200  9.866305
#10  2     2      2  5.197255 9.859347 10.088928
#11  2     3      1 29.961107 7.451472 -3.143658
#12  2     3      2 31.322740 8.328626 -2.050261
#13  3     1      1 71.010782 6.909414  7.128306
#14  3     1      2 69.860047 7.675693  7.817473
#15  3     2      1 64.985282 1.596932 -3.422237
#16  3     2      2 64.839996 2.828168 -3.826748
#17  3     3      1 70.631159 1.238806  5.398818
#18  3     3      2 70.963814 1.255340  3.728302

Transforming multible columns from wide to long format.

Answers (2)

Related Questions