Reputation: 7873
I have data frames like the following that I need to reformat into a single row, so that I can create a new data frame that's a collection of many of the simpler data frames, with one row in the new data frame representing all of the data of one of the simpler original data frames.
Here's a trivial example of the format of the original data frames:
> myDf = data.frame(Seconds=seq(0,1,.25), s1=seq(0,8,2), s2=seq(1,9,2))
>
> myDf
Seconds s1 s2
1 0.00 0 1
2 0.25 2 3
3 0.50 4 5
4 0.75 6 7
5 1.00 8 9
And below is what I want it to look like after being reformatted. Each column indicates rXsY, where "rX" indicates the row number of the original data frame, and "sY" indicates the "s1" or "s2" column of the original data frame. The "Seconds" column is omitted in the new data frame, as its information is implicit in the row number.
> myNewDf
r1s1 r1s2 r2s1 r2s2 r3s1 r3s2 r4s1 r4s2 r5s1 r5s2
1 0 1 2 3 4 5 6 7 8 9
I suspect this is really simple and probably involves some combination of reshape()
, melt()
, and/or cast()
, but the proper incantations are escaping me. I could post what I've tried, but I think it would just distract from what's probably a simple question? If anyone would like me to do so, just ask in the comments.
The ideal solution would also somehow programmatically generate the new column names based on the original data frame's column names, since the column names won't always be the same. Also, if it's not difficult, can I somehow simultaneously do this same operation to a list of similar data frames (all the same number of rows, all the same column names, but with differing values in their s1 & s2 columns)? Ultimately I need a single data frame that contains the data from multiple simpler data frames, like this...
> myCombinedNewDf # data combined from 4 separate original data frames
r1s1 r1s2 r2s1 r2s2 r3s1 r3s2 r4s1 r4s2 r5s1 r5s2
1 0 1 2 3 4 5 6 7 8 9
2 10 11 12 13 14 15 16 17 18 19
3 20 21 22 23 24 25 26 27 28 29
4 30 31 32 33 34 35 36 37 38 39
Upvotes: 4
Views: 5421
Reputation: 4474
base R solution
#prepare data
myDf1 = data.frame(Seconds=seq(0,1,.25), s1=seq(0,8,2), s2=seq(1,9,2))
myDf2 = data.frame(Seconds=seq(0,1,.25), s1=seq(10,18,2), s2=seq(11,19,2))
myDfList=list(myDf1,myDf2)
#allocate memory
myCombinedNewDf=data.frame(matrix(NA_integer_,nrow=length(myDfList),ncol=(ncol(myDf1)-1)*nrow(myDf1)))
#reformat
for (idx in 1:length(myDfList)) myCombinedNewDf[idx,]=c(t(myDfList[[idx]][,-1]))
#set colnames
colnames(myCombinedNewDf)=paste0("r",sort(rep.int(1:nrow(myDf1),2)),colnames(myDf1)[-1])
As per request an extended version that handles a separate factor column:
#allocate memory
#the first column should ultimately be a factor
#I would use a character column first and later change it to type factor
#note the stringsAsFactors option!
myCombinedNewDf=data.frame(rep(NA_character_,length(myDfList)),
matrix(NA_integer_,
nrow=length(myDfList),
ncol=(ncol(myDf1)-1)*nrow(myDf1)),
stringsAsFactors=FALSE)
#reformat
for (idx in 1:length(myDfList)) {
myCombinedNewDf[idx,-1]=c(t(myDfList[[idx]][,-1]))
#I have just made up some criterion to get one "yes" and one "no"
#"yes" if the sum of all values is below 100, "no" otherwise
myCombinedNewDf[idx,1]=if (sum(myDfList[[idx]][,-1])<100) "yes" else "no"
}
#set colnames
colnames(myCombinedNewDf)=c("flag",
paste0("r",
sort(rep.int(1:nrow(myDf1),2)),
colnames(myDf1)[-1])
)
myCombinedNewDf$flag=factor(myCombinedNewDf$flag)
myCombinedNewDf
Upvotes: 1
Reputation: 193517
The relevant values can be extracted row-wise by using c(t(therelevantdata))
.
In other words:
Values <- c(t(myDf[-1]))
If names are important at this point, you can do:
Names <- sprintf("r%ss%s", rep(1:5, each = 2), 1:2)
You can get a named vector with:
setNames(Values, Names)
# r1s1 r1s2 r2s1 r2s2 r3s1 r3s2 r4s1 r4s2 r5s1 r5s2
# 0 1 2 3 4 5 6 7 8 9
Or a named single-row data.frame
with:
setNames(data.frame(t(Values)), Names)
# r1s1 r1s2 r2s1 r2s2 r3s1 r3s2 r4s1 r4s2 r5s1 r5s2
# 1 0 1 2 3 4 5 6 7 8 9
If you have a list
of your data.frame
s, as shared in @cyro111's answer, you can easily do the following:
do.call(rbind, lapply(myDfList, function(x) c(t(x[-1]))))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 0 1 2 3 4 5 6 7 8 9
# [2,] 10 11 12 13 14 15 16 17 18 19
Convert to data.frame
with as.data.frame
and add the names with either names <-
or setNames
.
Generalized as a function:
myFun <- function(indf, asVec = TRUE) {
values <- c(t(indf[-1]))
Names <- sprintf("r%ss%s", rep(1:nrow(indf), each = ncol(indf[-1])),
1:ncol(indf[-1]))
out <- setNames(values, Names)
if (isTRUE(asVec)) out
else (as.data.frame(as.matrix(t(out))))
}
Try it out:
myFun(myDf) # Vector
myFun(myDf, FALSE) # data.frame
It's even more convenient on a list
of data.frame
s.... lots of options :-)
dfList1 <- list(
data.frame(s = 1:2, a1 = 1:2, a2 = 3:4, a3 = 5:6),
data.frame(s = 1:2, a1 = 11:12, a2 = 31:32, a3 = 51:52)
)
lapply(dfList1, myFun)
do.call(rbind, lapply(dfList1, myFun))
t(sapply(dfList1, myFun))
as.data.frame(do.call(rbind, lapply(dfList1, myFun)))
Upvotes: 3
Reputation: 887108
You can try dcast
from the devel version of data.table
i.e. v1.9.5 which can take multiple value.var
columns. Create two columns one with row number
('rn') and the second a grouping variable ('grp'), and use dcast
. The installation details are here
library(data.table)#v1.9.5+
dcast(setDT(myDf[-1])[, c('rn1', 'grp') := list(paste0('r', 1:.N), 1)],
grp~rn1, value.var=c('s1', 's2'))
# grp r1_s1 r2_s1 r3_s1 r4_s1 r5_s1 r1_s2 r2_s2 r3_s2 r4_s2 r5_s2
#1: 1 0 2 4 6 8 1 3 5 7 9
Or we can use reshape
from base R
reshape(transform(myDf, rn1=paste0('r', 1:nrow(myDf)), grp=1)[-1],
idvar='grp', timevar='rn1', direction='wide')
# grp s1.r1 s2.r1 s1.r2 s2.r2 s1.r3 s2.r3 s1.r4 s2.r4 s1.r5 s2.r5
#1 1 0 1 2 3 4 5 6 7 8 9
If we have several dataframes, we can place the datasets in a list and then use lapply
with dcast
or rbind the datasets in the list with rbindlist
specifying a grouping variable for each dataset,then apply dcast
on the whole dataset.
Using 'myOtherDF` from @Alex A.'s post
myDFList <- list(myDf, myOtherDF)
dcast(rbindlist(Map(cbind, myDFList, gr=seq_along(myDFList)))[,-1,
with=FALSE][, rn1:= paste0('r', 1:.N), by=gr],
gr~rn1, value.var=c('s1', 's2'))
# gr r1_s1 r2_s1 r3_s1 r4_s1 r5_s1 r1_s2 r2_s2 r3_s2 r4_s2 r5_s2
#1: 1 0 2 4 6 8 1 3 5 7 9
#2: 2 1 3 5 7 9 0 2 4 6 8
Upvotes: 2
Reputation: 5586
Using melt()
from reshape2
, you can do it like this:
library(reshape2)
# Melt the data, omitting `Seconds`
df.melted <- melt(myDF[, -1], id.vars = NULL)
# Transpose the values into a single row
myNewDF <- t(df.melted[, 2])
# Assign new variable names
colnames(myNewDF) <- paste0("r", rownames(myDF), df.melted[, 1])
# r1s1 r2s1 r3s1 r4s1 r5s1 r1s2 r2s2 r3s2 r4s2 r5s2
# 1 0 2 4 6 8 1 3 5 7 9
This melts the data frame, uses the first column (the variable names from the original dataset) to construct the variable names for the new dataset, and uses the transpose of the second column (the data values) as the row of data.
If you want an automated approach to combining your datasets, you can take this a step further:
# Another data frame
myOtherDF <- data.frame(Seconds = seq(0, 1, 0.25),
s1 = seq(1, 9, 2),
s2 = seq(0, 8, 2))
# Turn the above steps into a function
colToRow <- function(x) {
melted <- melt(x[, -1], id.vars = NULL)
row <- t(melted[, 2])
colnames(row) <- paste0("r", rownames(x), melted[, 1])
row
}
# Create a list of the data frames to process
myDFList <- list(myDF, myOtherDF)
# Apply our function to each data frame in the list and append
myNewDF <- data.frame(do.call(rbind, lapply(myDFList, colToRow)))
# r1s1 r2s1 r3s1 r4s1 r5s1 r1s2 r2s2 r3s2 r4s2 r5s2
# 1 0 2 4 6 8 1 3 5 7 9
# 2 1 3 5 7 9 0 2 4 6 8
Upvotes: 3