Reputation: 9

Reshaping a csv in R a particular way

I have a CSV with a column format similar to this:

Section | ID | Totaltime | Item1/Word | Item1/Cat | Item1/Time...Item235/Time

I would like to reshape this so that instead of all 235 entries per ID on a single row, there is a row per item, sorted/chunked by ID, so it looks similar to this-

Section | ID0 | Totaltime | Item1/Word | Item1/Cat | Item1/Time 
                            Item2/Word | Item2/Cat | Item2/Time
                            Item3/Word | Item3/Cat | Item3/Time
                           ...Item235/Word | Item235/Cat | Item235/Time
Section | ID1 | Totaltime | Item1/Word | Item1/Cat | Item1/Time...

I've tried to melt it using the ID as the vars.id argument, and the various Items pulled together with a grepl into the measures.vars argument, but this results in something like this-

Section | ID0 | Totaltime
Section | ID0 | Item1/Word 
Section | ID0 | Item1/Cat 
Section | ID0 | Item1/Time 
             ...
Section | ID0 | Item235/Word 
Section | ID0 | Item235/Cat 
Section | ID0 | Item235/Time

I've also tried recasting this, but without much luck.

I'm new to R as of this week, so I'm sure there is probably something super obvious I'm missing, but I've hit a wall on this.

Upvotes: 0

Answers (3)

Arun

Reputation: 118809

melt from data.table v1.9.5+ can operate on multiple columns. (using @rawr's data)

require(data.table) # v1.9.5+
vals = unique(gsub("Item[0-9]+/", "", tail(names(dd), -3L)))
melt(setDT(dd), id=1:3, measure=lapply(vals, grep, names(dd)), value.name=vals)
#     Section   ID0 Totaltime variable   Word   Cat   Time
#  1:       1 10001       100        1 1/word 1/cat 1/time
#  2:       2 10002       200        1 1/word 1/cat 1/time
#  3:       3 10003       300        1 1/word 1/cat 1/time
#  4:       4 10004       400        1 1/word 1/cat 1/time
#  5:       5 10005       500        1 1/word 1/cat 1/time
#  6:       1 10001       100        2 2/word 2/cat 2/time
#  7:       2 10002       200        2 2/word 2/cat 2/time
#  8:       3 10003       300        2 2/word 2/cat 2/time
#  9:       4 10004       400        2 2/word 2/cat 2/time
# 10:       5 10005       500        2 2/word 2/cat 2/time
# 11:       1 10001       100        3 3/word 3/cat 3/time
# 12:       2 10002       200        3 3/word 3/cat 3/time
# 13:       3 10003       300        3 3/word 3/cat 3/time
# 14:       4 10004       400        3 3/word 3/cat 3/time
# 15:       5 10005       500        3 3/word 3/cat 3/time

Upvotes: 1

jwilley44

Reputation: 173

Try this

library(reshape2)
library(plyr)
df.melt <- melt(df, id.vars=c("Section", "ID0", "Totaltime"), variable.name="item.type", value.name="item.value")
df.mutate <- mutate(df.melt, item.no=gsub("(Item[0-9]+).*", "\\1", item.type), item.type=gsub("Item[0-9]+/", "", item.type)
df.final <- ddply(df.mutate, .(Section, ID0, Totaltime, item.no), function(d) df.final <- ddply(df.mutate, .(Section, ID0, Totaltime, item.no), function(d) dcast(d, Section + ID0 + Totaltime ~ item.type, value.var="item.value", fun.aggregate=function(x) x[1]))

Upvotes: 0

rawr

Reputation: 20811

I think this gets the format you need:

dd <- data.frame(Section = 1:5, ID0 = 10001:10005, Totaltime = 1:5 * 100,
                 'Item1/Word' = '1/word', 'Item1/Cat' = '1/cat',
                 'Item1/Time' = '1/time',
                 'Item2/Word' = '2/word', 'Item2/Cat' = '2/cat',
                 'Item2/Time' = '2/time',
                 'Item3/Word' = '3/word', 'Item3/Cat' = '3/cat',
                 'Item3/Time' = '3/time', stringsAsFactors = FALSE,
                 check.names = FALSE)


#   Section   ID0 Totaltime Item1/Word Item1/Cat Item1/Time Item2/Word Item2/Cat Item2/Time Item3/Word Item3/Cat Item3/Time
# 1       1 10001       100     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time
# 2       2 10002       200     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time
# 3       3 10003       300     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time
# 4       4 10004       400     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time
# 5       5 10005       500     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time

## define the varying columns:
keys <- paste0('Item', 1:3)
keys <- c('Word','Cat','Time')
l <- lapply(keys, function(x) grep(x, names(dd)))

rr <- reshape(dd, direction = 'long', varying = l)
rr <- rr[with(rr, order(Section, ID0, Totaltime)),
         ## `reshape` makes two extra variabes, time and id, we dont want
         -which(names(rr) %in% c('id','time'))]
rr[, 1:3] <- lapply(rr[, 1:3], function(x) ifelse(duplicated(x), '', x))
`rownames<-`(rr, NULL)

#    Section   ID0 Totaltime Item1/Word Item1/Cat Item1/Time
# 1        1 10001       100     1/word     1/cat     1/time
# 2                              2/word     2/cat     2/time
# 3                              3/word     3/cat     3/time
# 4        2 10002       200     1/word     1/cat     1/time
# 5                              2/word     2/cat     2/time
# 6                              3/word     3/cat     3/time
# 7        3 10003       300     1/word     1/cat     1/time
# 8                              2/word     2/cat     2/time
# 9                              3/word     3/cat     3/time
# 10       4 10004       400     1/word     1/cat     1/time
# 11                             2/word     2/cat     2/time
# 12                             3/word     3/cat     3/time
# 13       5 10005       500     1/word     1/cat     1/time
# 14                             2/word     2/cat     2/time
# 15                             3/word     3/cat     3/time

Upvotes: 0

Reshaping a csv in R a particular way

Answers (3)

Related Questions