Sandy
Sandy

Reputation: 1148

Complex data reshaping into wide format where input is a mix of long and wide data

I am working on fairly complex data. Here is a simplified snapshot as a data frame df.

ID      Measures              ME1   ME2 X1  X2
53-21   comm - 01 narrate      2    1   NA  NA
53-21   comm - overall         1    NA  NA  NA
53-21   comm - 10 participate  NA   NA  NA  NA
43-65   comm - 02 project      2    3   NA  NA
43-65   comm - 01 narrate      1    1   NA  NA
67-21   comm - 06 action       2    1   NA  NA
67-21   comm - 08 plan         1    1   NA  1
43-65   comm - overall         2    NA  NA  NA
53-21   comm - exhibit         1    1   NA  NA

Here:

ID = Unique user ID

Measures = Name of the item being measured or assessed for a given user

For each Measure, users can be graded on up to four different items such as ME1, ME2,X1, and X2.

I want to convert this data in a format where items are being placed in rows i.e., one ID per row and their corresponding measures in additional columns. My required reshaped data frame is something like this:

ID      comm-01-narrate-ME1 comm-01-narrate-ME2 comm-01-narrate-X1 comm-01-narrate-X2 comm-overall-ME1 comm-overall-ME2 comm-overall-X1 comm-overall-X2 comm-10-participate-ME1 comm-10-participate-ME2 comm-10-participate-X1 comm-10-participate-X2 comm-exhibit-ME1 comm-exhibit-ME2 comm-exhibit-X1 comm-exhibit-X2 comm-02-project-ME1 comm-02-project-ME2 comm-02-project-X1 comm-02-project-X2 comm-06-action-ME1 comm-06-action-ME2 comm-06-action-X1 comm-06-action-X2 comm-08-plan-ME1 comm-08-plan-ME2 comm-08-plan-X1 comm-08-plan-X2
53-21   2                   1                   NA                 NA                 1                NA               NA              NA              NA                      NA                      NA                     NA                     1                1                NA              NA              NA                  NA                  NA                 NA                 NA                 NA                 NA                NA                NA               NA               NA              NA
43-65   1                   1                   NA                 NA                 2                NA               NA              NA              NA                      NA                      NA                     NA                     NA               NA               NA              NA              2                   3                   NA                 NA                 NA                 NA                 NA                NA                NA               NA               NA              NA
67-21   NA                  NA                  NA                 NA                 NA               NA               NA              NA              NA                      NA                      NA                     NA                     NA               NA               NA              NA              NA                  NA                  NA                 NA                 2                  1                  NA                NA                1                1                NA              1

The dput() for the input file df is:

dput(df)

structure(list(ID = structure(c(2L, 2L, 2L, 1L, 1L, 3L, 3L, 1L, 2L), 
.Label = c("43-65", "53-21", "67-21"), class = "factor"), 
    Measures = structure(c(1L, 7L, 5L, 2L, 1L, 3L, 4L, 7L, 6L), 
    .Label = c("comm - 01 narrate", "comm - 02 project", "comm - 06 action", "comm - 08 plan", "comm - 10 participate", "comm - exhibit", "comm - overall"), class = "factor"), 
    ME1 = c(2L, 1L, NA, 2L, 1L, 2L, 1L, 2L, 1L), 
    ME2 = c(1L, NA, NA, 3L, 1L, 1L, 1L, NA, 1L), 
    X1 = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_), 
    X2 = c(NA, NA, NA, NA, NA, NA, 1L, NA, NA)), class = "data.frame", row.names = c(NA, -9L))

I am struggling in defining the problem and even starting the processing. The data can be considered long but because of multiple columns, it is wide as well.

Any advice on how to achieve this output file would be greatly appreciated.

Thank you for your time in reading this post.

EDIT 1

From a related post Convert data from long format to wide format with multiple measure columns I tried the following solution:

library(data.table)
df2 = dcast(setDT(df), ID~Measures, 
            value.var=c("ME1", "ME2", "X1", "X2"))

However, I get a warning:

Aggregate function missing, defaulting to 'length'

This means that the entries in my data are all changed to 1 or NA. I do not want this to happen.

EDIT 2

When I test the suggested solution on my original data, it fails. To better explain, I am providing a small sample that closely replicates my original data. None of the existing solutions work.

dput(df)

structure(list(
Id = c("39fca07f-d62e-494a-4a86-8dec54836c08", "39fca8ee-fe3f-4c85-ab0a-acb3c2db1b9c", "39fca8ed-f34c-b7e3-4229-111155aabe35", "39fca8e9-1e08-1809-c7a8-d2c8a4bc9b00", "39fc6ae5-0de8-4820-eede-343e738e7a4a", "39fca8e9-fbf9-a098-cf8c-322810997ce9"), DeliverId = c("39fb74ce-d5e6-69f6-f733-ee5fbc4689e6", "39fb74ce-d5e6-69f6-f733-ee5fbc4689e6", "39fb74ce-d5e6-69f6-f733-ee5fbc4689e6", "39fb74ce-d5e6-69f6-f733-ee5fbc4689e6", "39fb74ce-d5e6-69f6-f733-ee5fbc4689e6", "39fb74ce-d5e6-69f6-f733-ee5fbc4689e6"), 
DeliverN = c("1Assess", "1Assess", "1Assess", "1Assess", "1Assess", "1Assess"), 
AssessRId = c("39fb74cf-5fb6-4248-6d08-0e36647e190b", "39fb74cf-5fb6-4248-6d08-0e36647e190b", "39fb74cf-5fb6-4248-6d08-0e36647e190b", "39fb74cf-5fb6-4248-6d08-0e36647e190b", "39fb74cf-5fb6-4248-6d08-0e36647e190b", "39fb74cf-5fb6-4248-6d08-0e36647e190b"), 
AssessRN = c("P1", "P2", "P3", "P4", "P5", "P6"), 
AssessTId = c("1ee2684c99fa", "fd2dbea08b43", "0e0177a33282", "091b8f805553", "6e5b9301116d", "7a307a90de19"), 
AssessTN = c("Comm - 09 Narrate", "Comm - Prog Level Judge", "Comm - O Indi Level Judge", "Comm - 02 Int Prj", "Comm - 10 Learn Comm Participate", 
"Comm - 05 Exhibit"), 
S.Time = c("21/05/2020 19:47", "23/05/2020 11:06", "23/05/2020 11:05", "23/05/2020 10:59", "11/05/2020 9:58", "23/05/2020 11:00"), 
F.Time = c("24/05/2020 11:02", "23/05/2020 11:06", "23/05/2020 11:05", 
"23/05/2020 11:00", "23/05/2020 11:04", "23/05/2020 11:03"),     CompletedIndi = c(8L, 1L, 8L, 8L, 8L, 8L), 
TotalIndi = c(8L, 1L, 8L, 8L, 8L, 8L), 
Progress = c(100L, 100L, 100L, 100L, 100L, 100L), 
Build = c("Monice Island", "Pink Lasy", "", "", "", ""), 
Advice = c("Monica", "Chandler", "", "", "", ""), 
TechUserId = c(128L, 129L, 130L, 129L, 129L, 129L), 
TechName = c("Barba", "Raymond", "Raymond", "Raymond", "Raymond","Raymond"), TechEmail = c("[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]"), 
TechLife = c("0 - 2 years", "Over 10 years", "Over 10 years", "Over 10 years", "Over 10 years", "Over 10 years"), 
OtherLife = c("0 - 2 years", "5 - 10 years", "5 - 10 years", "5 - 10 years", "5 - 10 years", "5 - 10 years"), 
PersonUId = c(470L, 455L, 455L, 455L, 455L, 455L), 
PersonDName = c("Tall Tiffany", "Sharp Steff", "Sharp Steff", "Sharp Steff", "Sharp Steff", "Sharp Steff"), 
PersonFName = c("Tall", "Sharp", "Sharp", "Sharp", "Sharp", "Sharp"), PersonLName = c("Tiffany", "Steff", "Steff", "Steff", "Steff", "Steff"), PersonUID = c("2783-4409", "4307-4369", "4307-4369", "4307-4369", "4307-4369", "4307-4369"), 
Gender = c("Female", "Female", "Female", "Female", "Female", "Female"), PYear = c(2023L, 2024L, 2024L, 2024L, 2024L, 2024L), 
Course = c("Undergrad", "Grad", "Grad", "Grad", "Grad", "Grad"), 
Special = c("Yes", "No", "No", "No", "No", "No"), 
Q1 = c(2L, 1L, 3L, 3L, 2L, 2L), 
Q2 = c(1L, NA, 2L, 2L, 1L, 2L), 
Q3 = c(1L, NA, 3L, 3L, 2L, 2L), 
Q4 = c(1L, NA, 3L, 3L, 2L, 1L), 
Q5 = c(1L, NA, 2L, 2L, 1L, 2L), 
Q6 = c(1L, NA, 0L, 1L, 1L, 1L), 
Q7 = c(1L, NA, 2L, 1L, 2L, 2L), 
Q8 = c(2L, NA, 2L, 1L, 2L, 2L), 
Q9 = c(NA, NA, NA, NA, NA, NA), 
Q10 = c(NA, NA, NA, NA, NA, NA), 
X = c(NA, NA, NA, NA, NA, NA), 
X.1 = c(NA, NA, NA, NA, NA, NA), 
ListDetails = c("Missing", "Complete", "Complete", "Complete", "Complete", "Complete")), 
class = "data.frame", row.names = c(NA, -6L))

The desired output is as below:

Id                                   DeliverId                            DeliverN AssessRId                            AssessRN AssessTId      S-Time           F-Time             CompletedIndi   TotalIndi   Progress    Build           Advice      TechUserId  TechName    TechEmail       TechLife        OtherLife       PersonDName     PersonFName PersonLName PersonUID Gender PYear  Course      Special ListDetails PersonUId Q1_Comm - 02 Int Prj Q1_Comm - 05 Exhibit Q1_Comm - 09 Narrate Q1_Comm - 10 Learn Comm Participate Q1_Comm - O Indi Level Judge Q1_Comm - Prog Level Judge Q2_Comm - 02 Int Prj Q2_Comm - 05 Exhibit Q2_Comm - 09 Narrate Q2_Comm - 10 Learn Comm Participate Q2_Comm - O Indi Level Judge Q2_Comm - Prog Level Judge Q3_Comm - 02 Int Prj Q3_Comm - 05 Exhibit Q3_Comm - 09 Narrate Q3_Comm - 10 Learn Comm Participate Q3_Comm - O Indi Level Judge Q3_Comm - Prog Level Judge Q4_Comm - 02 Int Prj Q4_Comm - 05 Exhibit Q4_Comm - 09 Narrate Q4_Comm - 10 Learn Comm Participate Q4_Comm - O Indi Level Judge Q4_Comm - Prog Level Judge Q5_Comm - 02 Int Prj Q5_Comm - 05 Exhibit Q5_Comm - 09 Narrate Q5_Comm - 10 Learn Comm Participate Q5_Comm - O Indi Level Judge Q5_Comm - Prog Level Judge Q6_Comm - 02 Int Prj Q6_Comm - 05 Exhibit Q6_Comm - 09 Narrate Q6_Comm - 10 Learn Comm Participate Q6_Comm - O Indi Level Judge Q6_Comm - Prog Level Judge Q7_Comm - 02 Int Prj Q7_Comm - 05 Exhibit Q7_Comm - 09 Narrate Q7_Comm - 10 Learn Comm Participate Q7_Comm - O Indi Level Judge Q7_Comm - Prog Level Judge Q8_Comm - 02 Int Prj Q8_Comm - 05 Exhibit Q8_Comm - 09 Narrate Q8_Comm - 10 Learn Comm Participate Q8_Comm - O Indi Level Judge Q8_Comm - Prog Level Judge Q9_Comm - 02 Int Prj Q9_Comm - 05 Exhibit Q9_Comm - 09 Narrate Q9_Comm - 10 Learn Comm Participate Q9_Comm - O Indi Level Judge Q9_Comm - Prog Level Judge Q10_Comm - 02 Int Prj Q10_Comm - 05 Exhibit Q10_Comm - 09 Narrate Q10_Comm - 10 Learn Comm Participate Q10_Comm - O Indi Level Judge Q10_Comm - Prog Level Judge X_Comm - 02 Int Prj X_Comm - 05 Exhibit X_Comm - 09 Narrate X_Comm - 10 Learn Comm Participate X_Comm - O Indi Level Judge X_Comm - Prog Level Judge X.1_Comm - 02 Int Prj X.1_Comm - 05 Exhibit X.1_Comm - 09 Narrate X.1_Comm - 10 Learn Comm Participate X.1_Comm - O Indi Level Judge X.1_Comm - Prog Level Judge
39fca07f-d62e-494a-4a86-8dec54836c08 39fb74ce-d5e6-69f6-f733-ee5fbc4689e6 1Assess  39fb74cf-5fb6-4248-6d08-0e36647e190b P1       1ee2684c99fa   21/05/2020 19:47 24/05/2020 11:02   8               8           100         Monice Island   Monica      128         Barba       [email protected]  0 - 2 years     0 - 2 years     Tall Tiffany    Tall        Tiffany     2783-4409 Female 2023   Undergrad   Yes     Missing     470       NA                   NA                   2                    NA                                  NA                           NA                         NA                   NA                   1                    NA                                  NA                           NA                         NA                   NA                   1                    NA                                   NA                          NA                          NA                  NA                   1                    NA                                  NA                           NA                           NA                  NA                  1                    NA                                  NA                           NA                         NA                   NA                   1                    NA                                  NA                           NA                         NA                   NA                     1                   NA                                  NA                          NA                          NA                  NA                      2                   NA                                  NA                          NA                          NA                  NA                  NA                      NA                                  NA                          NA                          NA                  NA                      NA                  NA                                  NA                              NA                          NA                  NA                  NA                  NA                                  NA                          NA                      NA                      NA                  NA                      NA                                  NA                          NA
39fca8ee-fe3f-4c85-ab0a-acb3c2db1b9c 39fb74ce-d5e6-69f6-f733-ee5fbc4689e6 1Assess  39fb74cf-5fb6-4248-6d08-0e36647e190b P2       fd2dbea08b43   23/05/2020 11:06 23/05/2020 11:06   1               1           100         Pink Lasy       Chandler    129         Raymond     [email protected] Over 10 years   5 - 10 years    Sharp Steff     Sharp       Steff       4307-4369 Female 2024   Grad        No      Complete    455       3                    2                    NA                   2                                   3                            1                          2                    2                    NA                   1                                   2                            NA                         3                    2                    NA                   2                                    3                           NA                          3                   1                    NA                   2                                   3                            NA                           2                   2                   NA                   1                                   2                            NA                         1                    1                    NA                   1                                   0                            NA                         1                    2                      NA                  2                                   2                           NA                          1                   2                       NA                  2                                   2                           NA                          NA                  NA                  NA                      NA                                  NA                          NA                          NA                  NA                      NA                  NA                                  NA                              NA                          NA                  NA                  NA                  NA                                  NA                          NA                      NA                      NA                  NA                      NA                                  NA                          NA

Note: I don't think this question is a duplicate as none of the previous solutions work on my data set. I request you to please open my question again and make it visible please.

I would appreciate any help or advice on why the solutions do not work on my data set.

Upvotes: 1

Views: 148

Answers (1)

akrun
akrun

Reputation: 887148

We can use pivot_wider which takes multiple values_from columns

library(dplyr)
library(tidyr)
df %>%
     pivot_wider(names_from = Measures, values_from = ME1:X2)

-output

# A tibble: 3 x 29
  ID    `ME1_comm - 01 na… `ME1_comm - overa… `ME1_comm - 10 par… `ME1_comm - 02 p… `ME1_comm - 06 a… `ME1_comm - 08 p… `ME1_comm - exhi…
  <fct>              <int>              <int>               <int>             <int>             <int>             <int>             <int>
1 53-21                  2                  1                  NA                NA                NA                NA                 1
2 43-65                  1                  2                  NA                 2                NA                NA                NA
3 67-21                 NA                 NA                  NA                NA                 2                 1                NA
# … with 21 more variables: ME2_comm - 01 narrate <int>, ME2_comm - overall <int>, ME2_comm - 10 participate <int>,
#   ME2_comm - 02 project <int>, ME2_comm - 06 action <int>, ME2_comm - 08 plan <int>, ME2_comm - exhibit <int>,
#   X1_comm - 01 narrate <int>, X1_comm - overall <int>, X1_comm - 10 participate <int>, X1_comm - 02 project <int>,
#   X1_comm - 06 action <int>, X1_comm - 08 plan <int>, X1_comm - exhibit <int>, X2_comm - 01 narrate <int>, X2_comm - overall <int>,
#   X2_comm - 10 participate <int>, X2_comm - 02 project <int>, X2_comm - 06 action <int>, X2_comm - 08 plan <int>,
#   X2_comm - exhibit <int>

Upvotes: 1

Related Questions