Reputation: 760
I'm trying to reshape my data into wide form, but I'm new to this, and so far the ordinary reshape function doesn't seem to work. When I try with the full data (this excerpt seems to work though) I get NA values for all of my values and the column names turn to strange numerical vectors.
My data looks like this, PNR is an id variable that's unique to each observation.
PNR ZIPCODE SEL_CRITERION QUAL_RATING QUEUENUMBER UPSEC_ID UPSEC_COURSE_ID MARK COURSEOFFERING_ID ADMISSIONROUND_ID RESULT WITHIN_PROGRAM SUMMA
1234567890 46395 HB 55 0 HRF SV203 G 97116 HT2019 20 0 67.5
1234567890 46395 HB 55 0 HRF EN200 VG 97116 HT2019 20 0 67.5
1234567890 46395 HB 55 0 HRF MA200 VG 97116 HT2019 20 0 67.5
1234567890 46395 HB 55 0 HRF <null> <null> 97116 HT2019 20 0 67.5
2345678901 42332 B5 2645 0 3SB EN1201 VG 97116 HT2019 20 0 70.5
2345678901 42332 B5 2645 0 3SB MA1201 VG 97113 HT2019 20 0 70.5
2345678901 42332 B5 2645 0 2SM SV1201 VG 97113 HT2019 20 0 70.5
I would like for it to look something like this:
PNR ZIPCODE HB B5 QUEUENUMBER UPSEC_ID SV203 EN200 MA200 <null> EN1201 MA1201 SV1201
1234567890 46395 95 NA 0 HRF G VG VG NA NA NA NA
2345678901 42332 NA 1645 0 3SB NA NA NA NA VG VG VG
Is there any way I can make this happen?
I've tried to use the ordinary reshape function, which fails miserably (though not necessarily on this small excerpt):
test<-reshape(HT2018, idvar="PNR",timevar=c("SEL_CRITERION", "UPSEC_COURSE_ID"), v.names=c("QUAL_RATING","MARK"), direction = "wide")
I've also tried the melt and cast functions from the reshape2 package, which returns a single row with three values (neither of them correct) though I might of course be doing something wrong:
test<-melt(HT2018, id="PNR")
test<-cast(test, QUAL_RATING + MARK ~ PNR)
structure(list(PNR = c(1234567890, 1234567890, 1234567890, 1234567890,
2345678901, 2345678901, 2345678901), ZIPCODE = c(46395L, 46395L,
46395L, 46395L, 42332L, 42332L, 42332L), SEL_CRITERION = structure(c(2L,
2L, 2L, 2L, 1L, 1L, 1L), .Label = c("B5", "HB "), class = "factor"),
QUAL_RATING = c(55L, 55L, 55L, 55L, 2645L, 2645L, 2645L),
QUEUENUMBER = c(0L, 0L, 0L, 0L, 0L, 0L, 0L), UPSEC_ID = structure(c(2L,
2L, 2L, 2L, 1L, 1L, 1L), .Label = c("3SB", "HRF"), class = "factor"),
UPSEC_COURSE_ID = structure(c(7L, 3L, 5L, 1L, 2L, 4L, 6L), .Label = c("<null>",
"EN1201 ", "EN200 ", "MA1201 ", "MA200 ",
"SV1201 ", "SV203 "), class = "factor"), MARK = structure(c(2L,
3L, 3L, 1L, 3L, 3L, 3L), .Label = c("<null>", "G ", "VG "
), class = "factor"), COURSEOFFERING_ID = c(97113L, 97113L,
97113L, 97113L, 97113L, 97113L, 97113L), ADMISSIONROUND_ID = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = "HT2018 ", class = "factor"),
RESULT = c(20L, 20L, 20L, 20L, 20L, 20L, 20L), WITHIN_PROGRAM = c(0L,
0L, 0L, 0L, 0L, 0L, 0L), SUMMA = structure(c(1L, 1L, 1L,
1L, 2L, 2L, 2L), .Label = c("67.5", "70.5"), class = "factor")), class = "data.frame", row.names = c(NA,
-7L))
Upvotes: 0
Views: 58
Reputation: 1364
Pending clarification on the aggregation of values HB/B5 (comment), you can use data.table's dcast() to convert the data to wide format across MARK levels of UPSEC_COURSE_ID.
dt = data.table(dt)
dt_betyg = dcast(dt, PNR + ZIPCODE + UPSEC_ID + QUEUENUMBER + SEL_CRITERION + QUAL_RATING ~ UPSEC_COURSE_ID, value.var = c('MARK'))
Result
> dt_betyg
PNR ZIPCODE UPSEC_ID QUEUENUMBER SEL_CRITERION QUAL_RATING <null> EN1201 EN200 MA1201 MA200 SV1201 SV203
1: 1234567890 46395 HRF 0 HB 55 <null> <NA> VG <NA> VG <NA> G
2: 2345678901 42332 3SB 0 B5 2645 <NA> VG <NA> VG <NA> VG <NA>
Upvotes: 1