Reputation: 9
I have a CSV with a column format similar to this:
Section | ID | Totaltime | Item1/Word | Item1/Cat | Item1/Time...Item235/Time
I would like to reshape this so that instead of all 235 entries per ID on a single row, there is a row per item, sorted/chunked by ID, so it looks similar to this-
Section | ID0 | Totaltime | Item1/Word | Item1/Cat | Item1/Time
Item2/Word | Item2/Cat | Item2/Time
Item3/Word | Item3/Cat | Item3/Time
...Item235/Word | Item235/Cat | Item235/Time
Section | ID1 | Totaltime | Item1/Word | Item1/Cat | Item1/Time...
I've tried to melt it using the ID as the vars.id argument, and the various Items pulled together with a grepl into the measures.vars argument, but this results in something like this-
Section | ID0 | Totaltime
Section | ID0 | Item1/Word
Section | ID0 | Item1/Cat
Section | ID0 | Item1/Time
...
Section | ID0 | Item235/Word
Section | ID0 | Item235/Cat
Section | ID0 | Item235/Time
I've also tried recasting this, but without much luck.
I'm new to R as of this week, so I'm sure there is probably something super obvious I'm missing, but I've hit a wall on this.
Upvotes: 0
Views: 99
Reputation: 118809
melt
from data.table v1.9.5+ can operate on multiple columns. (using @rawr's data)
require(data.table) # v1.9.5+
vals = unique(gsub("Item[0-9]+/", "", tail(names(dd), -3L)))
melt(setDT(dd), id=1:3, measure=lapply(vals, grep, names(dd)), value.name=vals)
# Section ID0 Totaltime variable Word Cat Time
# 1: 1 10001 100 1 1/word 1/cat 1/time
# 2: 2 10002 200 1 1/word 1/cat 1/time
# 3: 3 10003 300 1 1/word 1/cat 1/time
# 4: 4 10004 400 1 1/word 1/cat 1/time
# 5: 5 10005 500 1 1/word 1/cat 1/time
# 6: 1 10001 100 2 2/word 2/cat 2/time
# 7: 2 10002 200 2 2/word 2/cat 2/time
# 8: 3 10003 300 2 2/word 2/cat 2/time
# 9: 4 10004 400 2 2/word 2/cat 2/time
# 10: 5 10005 500 2 2/word 2/cat 2/time
# 11: 1 10001 100 3 3/word 3/cat 3/time
# 12: 2 10002 200 3 3/word 3/cat 3/time
# 13: 3 10003 300 3 3/word 3/cat 3/time
# 14: 4 10004 400 3 3/word 3/cat 3/time
# 15: 5 10005 500 3 3/word 3/cat 3/time
Upvotes: 1
Reputation: 173
Try this
library(reshape2)
library(plyr)
df.melt <- melt(df, id.vars=c("Section", "ID0", "Totaltime"), variable.name="item.type", value.name="item.value")
df.mutate <- mutate(df.melt, item.no=gsub("(Item[0-9]+).*", "\\1", item.type), item.type=gsub("Item[0-9]+/", "", item.type)
df.final <- ddply(df.mutate, .(Section, ID0, Totaltime, item.no), function(d) df.final <- ddply(df.mutate, .(Section, ID0, Totaltime, item.no), function(d) dcast(d, Section + ID0 + Totaltime ~ item.type, value.var="item.value", fun.aggregate=function(x) x[1]))
Upvotes: 0
Reputation: 20811
I think this gets the format you need:
dd <- data.frame(Section = 1:5, ID0 = 10001:10005, Totaltime = 1:5 * 100,
'Item1/Word' = '1/word', 'Item1/Cat' = '1/cat',
'Item1/Time' = '1/time',
'Item2/Word' = '2/word', 'Item2/Cat' = '2/cat',
'Item2/Time' = '2/time',
'Item3/Word' = '3/word', 'Item3/Cat' = '3/cat',
'Item3/Time' = '3/time', stringsAsFactors = FALSE,
check.names = FALSE)
# Section ID0 Totaltime Item1/Word Item1/Cat Item1/Time Item2/Word Item2/Cat Item2/Time Item3/Word Item3/Cat Item3/Time
# 1 1 10001 100 1/word 1/cat 1/time 2/word 2/cat 2/time 3/word 3/cat 3/time
# 2 2 10002 200 1/word 1/cat 1/time 2/word 2/cat 2/time 3/word 3/cat 3/time
# 3 3 10003 300 1/word 1/cat 1/time 2/word 2/cat 2/time 3/word 3/cat 3/time
# 4 4 10004 400 1/word 1/cat 1/time 2/word 2/cat 2/time 3/word 3/cat 3/time
# 5 5 10005 500 1/word 1/cat 1/time 2/word 2/cat 2/time 3/word 3/cat 3/time
## define the varying columns:
keys <- paste0('Item', 1:3)
keys <- c('Word','Cat','Time')
l <- lapply(keys, function(x) grep(x, names(dd)))
rr <- reshape(dd, direction = 'long', varying = l)
rr <- rr[with(rr, order(Section, ID0, Totaltime)),
## `reshape` makes two extra variabes, time and id, we dont want
-which(names(rr) %in% c('id','time'))]
rr[, 1:3] <- lapply(rr[, 1:3], function(x) ifelse(duplicated(x), '', x))
`rownames<-`(rr, NULL)
# Section ID0 Totaltime Item1/Word Item1/Cat Item1/Time
# 1 1 10001 100 1/word 1/cat 1/time
# 2 2/word 2/cat 2/time
# 3 3/word 3/cat 3/time
# 4 2 10002 200 1/word 1/cat 1/time
# 5 2/word 2/cat 2/time
# 6 3/word 3/cat 3/time
# 7 3 10003 300 1/word 1/cat 1/time
# 8 2/word 2/cat 2/time
# 9 3/word 3/cat 3/time
# 10 4 10004 400 1/word 1/cat 1/time
# 11 2/word 2/cat 2/time
# 12 3/word 3/cat 3/time
# 13 5 10005 500 1/word 1/cat 1/time
# 14 2/word 2/cat 2/time
# 15 3/word 3/cat 3/time
Upvotes: 0