peter_w
peter_w

Reputation: 303

Reshaping a dataframe to create repeating variables

I have a data set that records people's qualifications. There are several rows of data per person, with variables in wide format. I need to "widen" it even further so that I have a single row for each person in the data with the variables repeating as columns. You guessed it - the data needs to go into a spreadsheet template.

There will be a maximum of 10 rows per person, but no specified minimum.

Here's a simplified example of the data in its current form:

current <- structure(list(id = c("Bob", "Bob", "Bob", "Bob", "Jim", "Jim", 
"Jim", "Jim"), awarding.body = c("SQA", "SQA", "SQA", "SQA", 
"SQA", "SQA", "SQA", "SQA"), qual.type = c("HIGHER GRADE", "HIGHER GRADE", 
"STANDARD GRADE", "STANDARD GRADE", "HIGHER GRADE", "HIGHER GRADE", 
"STANDARD GRADE", "STANDARD GRADE"), year.awarded = c(1998L, 
1998L, 1996L, 1996L, 1999L, 1999L, 1997L, 1997L), band = c("A", 
"A", "B", "B", "B", "B", "A", "B"), subject = c("Mathematics", 
"Chemistry", "French", "Physics", "Fine Art", "Geography", "Craft & Design", 
"French")), .Names = c("id", "awarding.body", "qual.type", "year.awarded", 
"band", "subject"), class = "data.frame", row.names = c(NA, -8L
))

Here is how I need the data to look

desired <- structure(list(id = c("Bob", "Jim"), awarding.body.1 = c("SQA", 
"SQA"), qual.type.1 = c("HIGHER GRADE", "HIGHER GRADE"), year.awarded.1 = 1998:1999, 
    band.1 = c("A", "B"), subject.1 = c("Mathematics", "Fine Art"
    ), awarding.body.2 = c("SQA", "SQA"), qual.type.2 = c("HIGHER GRADE", 
    "HIGHER GRADE"), year.awarded.2 = 1998:1999, band.2 = c("A", 
    "B"), subject.2 = c("Chemistry", "Geography"), awarding.body.3 = c("SQA", 
    "SQA"), qual.type.3 = c("STANDARD GRADE", "STANDARD GRADE"
    ), year.awarded.3 = 1996:1997, band.3 = c("B", "A"), subject.3 = c("French", 
    "Craft & Design"), awarding.body.4 = c("SQA", "SQA"), qual.type.4 = c("STANDARD GRADE", 
    "STANDARD GRADE"), year.awarded.4 = 1996:1997, band.4 = c("B", 
    "B"), subject.4 = c("Physics", "French")), .Names = c("id", 
"awarding.body.1", "qual.type.1", "year.awarded.1", "band.1", 
"subject.1", "awarding.body.2", "qual.type.2", "year.awarded.2", 
"band.2", "subject.2", "awarding.body.3", "qual.type.3", "year.awarded.3", 
"band.3", "subject.3", "awarding.body.4", "qual.type.4", "year.awarded.4", 
"band.4", "subject.4"), class = "data.frame", row.names = c(NA, 
-2L))

I tried various things with the Reshape2 package but I don't think this is a typical reshaping problem? I've looked at various reshaping questions on here but haven't found a solution.

Advice greatly appreciated.

Upvotes: 0

Views: 76

Answers (1)

akrun
akrun

Reputation: 887251

Try

current1 <- transform(current, indx=ave(seq_along(id), id, FUN=seq_along))
desired1 <- reshape(current1, idvar='id', timevar='indx', direction='wide')
row.names(desired1) <- NULL
attr(desired1, 'reshapeWide') <- NULL
all.equal(desired1, desired)
#[1] TRUE

Upvotes: 1

Related Questions