SF1
SF1

Reputation: 469

Wide to long with many different columns

I have used pivot_longer before but this time I have a much more complex wide dataframe and I cannot sort it out. The example code will provide you a reproducible dataframe. I haven't dealt with such thing before so I'm not sure it's correct to try to format this type of df in long format?

df <- data.frame(
  ID = as.numeric(c("7","8","10","11","13","15","16")),
  AGE = as.character(c("45 – 54","25 – 34","25 – 34","25 – 34","25 – 34","18 – 24","35 – 44")),
  GENDER = as.character(c("Female","Female","Male","Female","Other","Male","Female")),
  SD = as.numeric(c("3","0","0","0","3","2","0")),
  GAMING = as.numeric(c("0","0","0","0","2","2","0")),
  HW = as.numeric(c("2","2","0","2","2","2","2")),
  R1_1 = as.numeric(c("10","34","69","53","79","55","28")),
  M1_1 = as.numeric(c("65","32","64","53","87","55","27")),
  P1_1 = as.numeric(c("65","38","67","54","88","44","26")),
  R1_2 = as.numeric(c("15","57","37","54","75","91","37")),
  M1_2 = as.numeric(c("90","26","42","56","74","90","37")),
  P1_2 = as.numeric(c("90","44","33","54","79","95","37")),
  R1_3 = as.numeric(c("5","47","80","27","61","19","57")),
  M1_3 = as.numeric(c("30","71","80","34","71","15","57")),
  P1_3 = as.numeric(c("30","36","81","35","62","8","56")),
  R2_1 = as.numeric(c("10","39","75","31","71","80","59")),
  M2_1 = as.numeric(c("90","51","74","15","70","75","61")),
  P2_1 = as.numeric(c("90","52","35","34","69","83","60")),
  R2_2 = as.numeric(c("10","45","31","54","39","95","77")),
  M2_2 = as.numeric(c("60","70","40","78","5","97","75")),
  P2_2 = as.numeric(c("60","40","41","58","9","97","76")),
  R2_3 = as.numeric(c("5","38","78","45","25","16","22")),
  M2_3 = as.numeric(c("30","34","84","62","33","52","20")),
  P2_3 = as.numeric(c("30","34","82","45","32","16","22")),
  R3_1 = as.numeric(c("10","40","41","42","62","89","41")),
  M3_1 = as.numeric(c("90","67","37","40","27","89","42")),
  P3_1 = as.numeric(c("90","34","51","44","38","84","43")),
  R3_2 = as.numeric(c("10","37","20","54","8","93","69")),
  M3_2 = as.numeric(c("60","38","21","62","5","95","71")),
  P3_2 = as.numeric(c("60","38","23","65","14","92","69")),
  R3_3 = as.numeric(c("5","30","62","11","60","32","52")),
  M3_3 = as.numeric(c("30","67","34","55","45","25","45")),
  P3_3 = as.numeric(c("30","28","41","24","53","23","52")),
  R1_4 = as.numeric(c("10","40","61","17","39","72","25")),
  M1_4 = as.numeric(c("45","20","63","25","62","70","23")),
  P1_4 = as.numeric(c("45","52","56","16","26","72","27")),
  R2_4 = as.numeric(c("5","21","70","33","80","68","30")),
  M2_4 = as.numeric(c("35","21","69","27","85","69","23")),
  P2_4 = as.numeric(c("35","32","34","25","79","63","29")),
  R3_4 = as.numeric(c("10","29","68","21","8","71","41")),
  M3_4 = as.numeric(c("50","37","66","28","33","65","41")),
  P3_4 = as.numeric(c("50","38","47","28","24","71","41"))
  )

I would like to sort it out like in the following table enter image description here

the new column names are extracted from the old ones such that (example) in R1_1:

basically each row corresponds to 1 question answered by 1 person, and each question was answered through 3 different ratings (R, M, P)

thank you!

Upvotes: 3

Views: 61

Answers (2)

jay.sf
jay.sf

Reputation: 73712

Solution using our good ol' workhorse reshape. At first we grep the names with a "Wd_d" pattern, as well as their suffixes "d_d" for following use in reshape.

nm <- names(df[grep("_\\d", names(df))])
times <- unique(substr(nm, 2, 4))

res <- reshape(df, idvar="ID", varying=7:42, v.names=unique(substr(nm, 1, 1)),
               times=times,direction="long")

Getting us close to the result, we just need to strsplit the newly created "time" variable at the "_" and rbind it to the former.

res <- cbind(res, setNames(type.convert(do.call(rbind.data.frame, 
                                                strsplit(res$time, "_"))), 
                           c("Speed", "Sound")))

res <- res[order(res$AGE), ]  ## some ordering

Result

head(res)
#        ID     AGE GENDER SD GAMING HW time  R  M  P Speed Sound
# 15.1_1 15 18 – 24   Male  2      2  2  1_1 55 44 55     1     1
# 15.1_2 15 18 – 24   Male  2      2  2  1_2 90 95 91     1     2
# 15.1_3 15 18 – 24   Male  2      2  2  1_3 15  8 19     1     3
# 15.2_1 15 18 – 24   Male  2      2  2  2_1 75 83 80     2     1
# 15.2_2 15 18 – 24   Male  2      2  2  2_2 97 97 95     2     2
# 15.2_3 15 18 – 24   Male  2      2  2  2_3 52 16 16     2     3

Upvotes: 1

Konrad Rudolph
Konrad Rudolph

Reputation: 546083

If I understood you correctly, the following should work:

df %>%
    pivot_longer(
        cols = matches('[RMP]\\d_\\d'),
        names_to = c('RMP', 'Speed', 'Sound'),
        values_to = 'Data',
        names_pattern = '([RMP])(\\d)_(\\d)'
    ) %>%
    pivot_wider(names_from = RMP, values_from = Data)

This assumes that both “speed” and “sound” are single-digit values. If there’s the possibility of multiple digits, the occurrences of \\d in the patterns above need to be replaced by \\d+.

Upvotes: 4

Related Questions