Reputation: 1148
I am working on fairly complex data. Here is a simplified snapshot as a data frame df
.
ID Measures ME1 ME2 X1 X2
53-21 comm - 01 narrate 2 1 NA NA
53-21 comm - overall 1 NA NA NA
53-21 comm - 10 participate NA NA NA NA
43-65 comm - 02 project 2 3 NA NA
43-65 comm - 01 narrate 1 1 NA NA
67-21 comm - 06 action 2 1 NA NA
67-21 comm - 08 plan 1 1 NA 1
43-65 comm - overall 2 NA NA NA
53-21 comm - exhibit 1 1 NA NA
Here:
ID
= Unique user ID
Measures
= Name of the item being measured or assessed for a given user
For each Measure
, users can be graded on up to four different items such as ME1
, ME2
,X1
, and X2
.
I want to convert this data in a format where items are being placed in rows i.e., one ID per row and their corresponding measures in additional columns. My required reshaped data frame is something like this:
ID comm-01-narrate-ME1 comm-01-narrate-ME2 comm-01-narrate-X1 comm-01-narrate-X2 comm-overall-ME1 comm-overall-ME2 comm-overall-X1 comm-overall-X2 comm-10-participate-ME1 comm-10-participate-ME2 comm-10-participate-X1 comm-10-participate-X2 comm-exhibit-ME1 comm-exhibit-ME2 comm-exhibit-X1 comm-exhibit-X2 comm-02-project-ME1 comm-02-project-ME2 comm-02-project-X1 comm-02-project-X2 comm-06-action-ME1 comm-06-action-ME2 comm-06-action-X1 comm-06-action-X2 comm-08-plan-ME1 comm-08-plan-ME2 comm-08-plan-X1 comm-08-plan-X2
53-21 2 1 NA NA 1 NA NA NA NA NA NA NA 1 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
43-65 1 1 NA NA 2 NA NA NA NA NA NA NA NA NA NA NA 2 3 NA NA NA NA NA NA NA NA NA NA
67-21 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2 1 NA NA 1 1 NA 1
The dput()
for the input file df
is:
dput(df)
structure(list(ID = structure(c(2L, 2L, 2L, 1L, 1L, 3L, 3L, 1L, 2L),
.Label = c("43-65", "53-21", "67-21"), class = "factor"),
Measures = structure(c(1L, 7L, 5L, 2L, 1L, 3L, 4L, 7L, 6L),
.Label = c("comm - 01 narrate", "comm - 02 project", "comm - 06 action", "comm - 08 plan", "comm - 10 participate", "comm - exhibit", "comm - overall"), class = "factor"),
ME1 = c(2L, 1L, NA, 2L, 1L, 2L, 1L, 2L, 1L),
ME2 = c(1L, NA, NA, 3L, 1L, 1L, 1L, NA, 1L),
X1 = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_),
X2 = c(NA, NA, NA, NA, NA, NA, 1L, NA, NA)), class = "data.frame", row.names = c(NA, -9L))
I am struggling in defining the problem and even starting the processing. The data can be considered long
but because of multiple columns, it is wide
as well.
Any advice on how to achieve this output file would be greatly appreciated.
Thank you for your time in reading this post.
EDIT 1
From a related post Convert data from long format to wide format with multiple measure columns I tried the following solution:
library(data.table)
df2 = dcast(setDT(df), ID~Measures,
value.var=c("ME1", "ME2", "X1", "X2"))
However, I get a warning:
Aggregate function missing, defaulting to 'length'
This means that the entries in my data are all changed to 1 or NA. I do not want this to happen.
EDIT 2
When I test the suggested solution on my original data, it fails. To better explain, I am providing a small sample that closely replicates my original data. None of the existing solutions work.
dput(df)
structure(list(
Id = c("39fca07f-d62e-494a-4a86-8dec54836c08", "39fca8ee-fe3f-4c85-ab0a-acb3c2db1b9c", "39fca8ed-f34c-b7e3-4229-111155aabe35", "39fca8e9-1e08-1809-c7a8-d2c8a4bc9b00", "39fc6ae5-0de8-4820-eede-343e738e7a4a", "39fca8e9-fbf9-a098-cf8c-322810997ce9"), DeliverId = c("39fb74ce-d5e6-69f6-f733-ee5fbc4689e6", "39fb74ce-d5e6-69f6-f733-ee5fbc4689e6", "39fb74ce-d5e6-69f6-f733-ee5fbc4689e6", "39fb74ce-d5e6-69f6-f733-ee5fbc4689e6", "39fb74ce-d5e6-69f6-f733-ee5fbc4689e6", "39fb74ce-d5e6-69f6-f733-ee5fbc4689e6"),
DeliverN = c("1Assess", "1Assess", "1Assess", "1Assess", "1Assess", "1Assess"),
AssessRId = c("39fb74cf-5fb6-4248-6d08-0e36647e190b", "39fb74cf-5fb6-4248-6d08-0e36647e190b", "39fb74cf-5fb6-4248-6d08-0e36647e190b", "39fb74cf-5fb6-4248-6d08-0e36647e190b", "39fb74cf-5fb6-4248-6d08-0e36647e190b", "39fb74cf-5fb6-4248-6d08-0e36647e190b"),
AssessRN = c("P1", "P2", "P3", "P4", "P5", "P6"),
AssessTId = c("1ee2684c99fa", "fd2dbea08b43", "0e0177a33282", "091b8f805553", "6e5b9301116d", "7a307a90de19"),
AssessTN = c("Comm - 09 Narrate", "Comm - Prog Level Judge", "Comm - O Indi Level Judge", "Comm - 02 Int Prj", "Comm - 10 Learn Comm Participate",
"Comm - 05 Exhibit"),
S.Time = c("21/05/2020 19:47", "23/05/2020 11:06", "23/05/2020 11:05", "23/05/2020 10:59", "11/05/2020 9:58", "23/05/2020 11:00"),
F.Time = c("24/05/2020 11:02", "23/05/2020 11:06", "23/05/2020 11:05",
"23/05/2020 11:00", "23/05/2020 11:04", "23/05/2020 11:03"), CompletedIndi = c(8L, 1L, 8L, 8L, 8L, 8L),
TotalIndi = c(8L, 1L, 8L, 8L, 8L, 8L),
Progress = c(100L, 100L, 100L, 100L, 100L, 100L),
Build = c("Monice Island", "Pink Lasy", "", "", "", ""),
Advice = c("Monica", "Chandler", "", "", "", ""),
TechUserId = c(128L, 129L, 130L, 129L, 129L, 129L),
TechName = c("Barba", "Raymond", "Raymond", "Raymond", "Raymond","Raymond"), TechEmail = c("[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]"),
TechLife = c("0 - 2 years", "Over 10 years", "Over 10 years", "Over 10 years", "Over 10 years", "Over 10 years"),
OtherLife = c("0 - 2 years", "5 - 10 years", "5 - 10 years", "5 - 10 years", "5 - 10 years", "5 - 10 years"),
PersonUId = c(470L, 455L, 455L, 455L, 455L, 455L),
PersonDName = c("Tall Tiffany", "Sharp Steff", "Sharp Steff", "Sharp Steff", "Sharp Steff", "Sharp Steff"),
PersonFName = c("Tall", "Sharp", "Sharp", "Sharp", "Sharp", "Sharp"), PersonLName = c("Tiffany", "Steff", "Steff", "Steff", "Steff", "Steff"), PersonUID = c("2783-4409", "4307-4369", "4307-4369", "4307-4369", "4307-4369", "4307-4369"),
Gender = c("Female", "Female", "Female", "Female", "Female", "Female"), PYear = c(2023L, 2024L, 2024L, 2024L, 2024L, 2024L),
Course = c("Undergrad", "Grad", "Grad", "Grad", "Grad", "Grad"),
Special = c("Yes", "No", "No", "No", "No", "No"),
Q1 = c(2L, 1L, 3L, 3L, 2L, 2L),
Q2 = c(1L, NA, 2L, 2L, 1L, 2L),
Q3 = c(1L, NA, 3L, 3L, 2L, 2L),
Q4 = c(1L, NA, 3L, 3L, 2L, 1L),
Q5 = c(1L, NA, 2L, 2L, 1L, 2L),
Q6 = c(1L, NA, 0L, 1L, 1L, 1L),
Q7 = c(1L, NA, 2L, 1L, 2L, 2L),
Q8 = c(2L, NA, 2L, 1L, 2L, 2L),
Q9 = c(NA, NA, NA, NA, NA, NA),
Q10 = c(NA, NA, NA, NA, NA, NA),
X = c(NA, NA, NA, NA, NA, NA),
X.1 = c(NA, NA, NA, NA, NA, NA),
ListDetails = c("Missing", "Complete", "Complete", "Complete", "Complete", "Complete")),
class = "data.frame", row.names = c(NA, -6L))
The desired output is as below:
Id DeliverId DeliverN AssessRId AssessRN AssessTId S-Time F-Time CompletedIndi TotalIndi Progress Build Advice TechUserId TechName TechEmail TechLife OtherLife PersonDName PersonFName PersonLName PersonUID Gender PYear Course Special ListDetails PersonUId Q1_Comm - 02 Int Prj Q1_Comm - 05 Exhibit Q1_Comm - 09 Narrate Q1_Comm - 10 Learn Comm Participate Q1_Comm - O Indi Level Judge Q1_Comm - Prog Level Judge Q2_Comm - 02 Int Prj Q2_Comm - 05 Exhibit Q2_Comm - 09 Narrate Q2_Comm - 10 Learn Comm Participate Q2_Comm - O Indi Level Judge Q2_Comm - Prog Level Judge Q3_Comm - 02 Int Prj Q3_Comm - 05 Exhibit Q3_Comm - 09 Narrate Q3_Comm - 10 Learn Comm Participate Q3_Comm - O Indi Level Judge Q3_Comm - Prog Level Judge Q4_Comm - 02 Int Prj Q4_Comm - 05 Exhibit Q4_Comm - 09 Narrate Q4_Comm - 10 Learn Comm Participate Q4_Comm - O Indi Level Judge Q4_Comm - Prog Level Judge Q5_Comm - 02 Int Prj Q5_Comm - 05 Exhibit Q5_Comm - 09 Narrate Q5_Comm - 10 Learn Comm Participate Q5_Comm - O Indi Level Judge Q5_Comm - Prog Level Judge Q6_Comm - 02 Int Prj Q6_Comm - 05 Exhibit Q6_Comm - 09 Narrate Q6_Comm - 10 Learn Comm Participate Q6_Comm - O Indi Level Judge Q6_Comm - Prog Level Judge Q7_Comm - 02 Int Prj Q7_Comm - 05 Exhibit Q7_Comm - 09 Narrate Q7_Comm - 10 Learn Comm Participate Q7_Comm - O Indi Level Judge Q7_Comm - Prog Level Judge Q8_Comm - 02 Int Prj Q8_Comm - 05 Exhibit Q8_Comm - 09 Narrate Q8_Comm - 10 Learn Comm Participate Q8_Comm - O Indi Level Judge Q8_Comm - Prog Level Judge Q9_Comm - 02 Int Prj Q9_Comm - 05 Exhibit Q9_Comm - 09 Narrate Q9_Comm - 10 Learn Comm Participate Q9_Comm - O Indi Level Judge Q9_Comm - Prog Level Judge Q10_Comm - 02 Int Prj Q10_Comm - 05 Exhibit Q10_Comm - 09 Narrate Q10_Comm - 10 Learn Comm Participate Q10_Comm - O Indi Level Judge Q10_Comm - Prog Level Judge X_Comm - 02 Int Prj X_Comm - 05 Exhibit X_Comm - 09 Narrate X_Comm - 10 Learn Comm Participate X_Comm - O Indi Level Judge X_Comm - Prog Level Judge X.1_Comm - 02 Int Prj X.1_Comm - 05 Exhibit X.1_Comm - 09 Narrate X.1_Comm - 10 Learn Comm Participate X.1_Comm - O Indi Level Judge X.1_Comm - Prog Level Judge
39fca07f-d62e-494a-4a86-8dec54836c08 39fb74ce-d5e6-69f6-f733-ee5fbc4689e6 1Assess 39fb74cf-5fb6-4248-6d08-0e36647e190b P1 1ee2684c99fa 21/05/2020 19:47 24/05/2020 11:02 8 8 100 Monice Island Monica 128 Barba [email protected] 0 - 2 years 0 - 2 years Tall Tiffany Tall Tiffany 2783-4409 Female 2023 Undergrad Yes Missing 470 NA NA 2 NA NA NA NA NA 1 NA NA NA NA NA 1 NA NA NA NA NA 1 NA NA NA NA NA 1 NA NA NA NA NA 1 NA NA NA NA NA 1 NA NA NA NA NA 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
39fca8ee-fe3f-4c85-ab0a-acb3c2db1b9c 39fb74ce-d5e6-69f6-f733-ee5fbc4689e6 1Assess 39fb74cf-5fb6-4248-6d08-0e36647e190b P2 fd2dbea08b43 23/05/2020 11:06 23/05/2020 11:06 1 1 100 Pink Lasy Chandler 129 Raymond [email protected] Over 10 years 5 - 10 years Sharp Steff Sharp Steff 4307-4369 Female 2024 Grad No Complete 455 3 2 NA 2 3 1 2 2 NA 1 2 NA 3 2 NA 2 3 NA 3 1 NA 2 3 NA 2 2 NA 1 2 NA 1 1 NA 1 0 NA 1 2 NA 2 2 NA 1 2 NA 2 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Note: I don't think this question is a duplicate as none of the previous solutions work on my data set. I request you to please open my question again and make it visible please.
I would appreciate any help or advice on why the solutions do not work on my data set.
Upvotes: 1
Views: 148
Reputation: 887148
We can use pivot_wider
which takes multiple values_from
columns
library(dplyr)
library(tidyr)
df %>%
pivot_wider(names_from = Measures, values_from = ME1:X2)
-output
# A tibble: 3 x 29
ID `ME1_comm - 01 na… `ME1_comm - overa… `ME1_comm - 10 par… `ME1_comm - 02 p… `ME1_comm - 06 a… `ME1_comm - 08 p… `ME1_comm - exhi…
<fct> <int> <int> <int> <int> <int> <int> <int>
1 53-21 2 1 NA NA NA NA 1
2 43-65 1 2 NA 2 NA NA NA
3 67-21 NA NA NA NA 2 1 NA
# … with 21 more variables: ME2_comm - 01 narrate <int>, ME2_comm - overall <int>, ME2_comm - 10 participate <int>,
# ME2_comm - 02 project <int>, ME2_comm - 06 action <int>, ME2_comm - 08 plan <int>, ME2_comm - exhibit <int>,
# X1_comm - 01 narrate <int>, X1_comm - overall <int>, X1_comm - 10 participate <int>, X1_comm - 02 project <int>,
# X1_comm - 06 action <int>, X1_comm - 08 plan <int>, X1_comm - exhibit <int>, X2_comm - 01 narrate <int>, X2_comm - overall <int>,
# X2_comm - 10 participate <int>, X2_comm - 02 project <int>, X2_comm - 06 action <int>, X2_comm - 08 plan <int>,
# X2_comm - exhibit <int>
Upvotes: 1