Reputation: 18750
Lets say I have a data.table
with these columns
nodeID
hour1aaa
hour1bbb
hour1ccc
hour2aaa
hour2bbb
hour2ccc
...
hour24aaa
hour24bbb
hour24ccc
for a total of 72 columns. Let's call it rawtable
I want to reshape it so I have
nodeID
hour
aaa
bbb
ccc
for a total of just these 5 columns
where the hour column will contain whichever hour from the original 72 that it should be.
Let's call it newshape
The way I'm doing it now is to use rbindlist
with 24 items where each item is the proper subset of the bigger data.table. Like this (except I'm leaving out most of the hours in my example)
newshape<-rbindlist(list(
rawtable[,list(nodeID, Hour=1, aaa=hour1aaa, bbb=hour1bbb, ccc=hour1ccc)],
rawtable[,list(nodeID, Hour=2, aaa=hour2aaa, bbb=hour2bbb, ccc=hour2ccc)],
rawtable[,list(nodeID, Hour=24, aaa=hour24aaa, bbb=hour24bbb, ccc=hour24ccc)]))
Here is some sample data to play with
rawtable<-data.table(nodeID=c(1,2),hour1aaa=c(12.4,32),hour1bbb=c(61.1,65.33),hour1ccc=c(-4.2,54),hour2aaa=c(12.2,1.2),hour2bbb=c(12.2,5.7),hour2ccc=c(5.6,101.9),hour24aaa=c(45.2,8.5),hour24bbb=c(23,7.9),hour24ccc=c(98,32.3))
Using my rbindlist
approach gives the desired result but, as with most things I do with R, there is probably a better way. By better I mean more memory efficient, faster, and/or uses less lines of code. Does anyone have a better way to achieve this?
Upvotes: 2
Views: 172
Reputation: 93938
This is a classic reshape
problem if you get your names in the standard convention it expects, though I'm not sure this really harnesses the efficiency of the data.table
structure:
reshape(
setNames(rawtable, gsub("(\\D+)(\\d+)(\\D+)", "\\3.\\2", names(rawtable))),
idvar="nodeID", direction="long", varying=-1
)
Result:
nodeID hour aaa bbb ccc
1: 1 1 12.4 61.10 -4.2
2: 2 1 32.0 65.33 54.0
3: 1 2 12.2 12.20 5.6
4: 2 2 1.2 5.70 101.9
5: 1 24 45.2 23.00 98.0
6: 2 24 8.5 7.90 32.3
@Arun
's answer over here: https://stackoverflow.com/a/15510828/496803 may also be useful if you can adapt it to your current data.
Upvotes: 3
Reputation: 193687
One option is to use merged.stack
from my package "splitstackshape". This function, stack
s groups of columns and then merges the output together. Because of how the function creates the "time" variable, you can specify whatever you wanted to strip out from the column names. In this case, we want to strip out "hour", "aaa", "bbb", and "ccc" and have just the numbers remaining.
library(splitstackshape)
## Make sure you're using at least 1.2.0
packageVersion("splitstackshape")
# [1] ‘1.2.0’
merged.stack(rawtable, id.vars="nodeID",
var.stubs=c("aaa", "bbb", "ccc"),
sep="hour|aaa|bbb|ccc")
# nodeID .time_1 aaa bbb ccc
# 1: 1 1 12.4 61.10 -4.2
# 2: 1 2 12.2 12.20 5.6
# 3: 1 24 45.2 23.00 98.0
# 4: 2 1 32.0 65.33 54.0
# 5: 2 2 1.2 5.70 101.9
# 6: 2 24 8.5 7.90 32.3
Upvotes: 2