phonetagger
phonetagger

Reputation: 7873

How to reformat an R data frame with multiple rows into one row

I have data frames like the following that I need to reformat into a single row, so that I can create a new data frame that's a collection of many of the simpler data frames, with one row in the new data frame representing all of the data of one of the simpler original data frames.

Here's a trivial example of the format of the original data frames:

> myDf = data.frame(Seconds=seq(0,1,.25), s1=seq(0,8,2), s2=seq(1,9,2))
> 
> myDf
  Seconds s1 s2
1    0.00  0  1
2    0.25  2  3
3    0.50  4  5
4    0.75  6  7
5    1.00  8  9

And below is what I want it to look like after being reformatted. Each column indicates rXsY, where "rX" indicates the row number of the original data frame, and "sY" indicates the "s1" or "s2" column of the original data frame. The "Seconds" column is omitted in the new data frame, as its information is implicit in the row number.

> myNewDf
  r1s1 r1s2 r2s1 r2s2 r3s1 r3s2 r4s1 r4s2 r5s1 r5s2
1    0    1    2    3    4    5    6    7    8    9

I suspect this is really simple and probably involves some combination of reshape(), melt(), and/or cast(), but the proper incantations are escaping me. I could post what I've tried, but I think it would just distract from what's probably a simple question? If anyone would like me to do so, just ask in the comments.

The ideal solution would also somehow programmatically generate the new column names based on the original data frame's column names, since the column names won't always be the same. Also, if it's not difficult, can I somehow simultaneously do this same operation to a list of similar data frames (all the same number of rows, all the same column names, but with differing values in their s1 & s2 columns)? Ultimately I need a single data frame that contains the data from multiple simpler data frames, like this...

> myCombinedNewDf # data combined from 4 separate original data frames
  r1s1 r1s2 r2s1 r2s2 r3s1 r3s2 r4s1 r4s2 r5s1 r5s2
1    0    1    2    3    4    5    6    7    8    9
2   10   11   12   13   14   15   16   17   18   19
3   20   21   22   23   24   25   26   27   28   29
4   30   31   32   33   34   35   36   37   38   39

Upvotes: 4

Views: 5421

Answers (4)

cryo111
cryo111

Reputation: 4474

base R solution

#prepare data
myDf1 = data.frame(Seconds=seq(0,1,.25), s1=seq(0,8,2), s2=seq(1,9,2))
myDf2 = data.frame(Seconds=seq(0,1,.25), s1=seq(10,18,2), s2=seq(11,19,2))

myDfList=list(myDf1,myDf2)

#allocate memory
myCombinedNewDf=data.frame(matrix(NA_integer_,nrow=length(myDfList),ncol=(ncol(myDf1)-1)*nrow(myDf1)))

#reformat
for (idx in 1:length(myDfList))  myCombinedNewDf[idx,]=c(t(myDfList[[idx]][,-1]))

#set colnames
colnames(myCombinedNewDf)=paste0("r",sort(rep.int(1:nrow(myDf1),2)),colnames(myDf1)[-1])

As per request an extended version that handles a separate factor column:

#allocate memory
#the first column should ultimately be a factor
#I would use a character column first and later change it to type factor
#note the stringsAsFactors option!
myCombinedNewDf=data.frame(rep(NA_character_,length(myDfList)),
                       matrix(NA_integer_,
                              nrow=length(myDfList),
                              ncol=(ncol(myDf1)-1)*nrow(myDf1)),
                       stringsAsFactors=FALSE)

#reformat
for (idx in 1:length(myDfList))  {
  myCombinedNewDf[idx,-1]=c(t(myDfList[[idx]][,-1]))
  #I have just made up some criterion to get one "yes" and one "no"
  #"yes" if the sum of all values is below 100, "no" otherwise
  myCombinedNewDf[idx,1]=if (sum(myDfList[[idx]][,-1])<100) "yes" else "no"
}

#set colnames
colnames(myCombinedNewDf)=c("flag",
                        paste0("r",
                               sort(rep.int(1:nrow(myDf1),2)),
                               colnames(myDf1)[-1])
                        )
myCombinedNewDf$flag=factor(myCombinedNewDf$flag)
myCombinedNewDf

Upvotes: 1

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

The relevant values can be extracted row-wise by using c(t(therelevantdata)).

In other words:

Values <- c(t(myDf[-1]))

If names are important at this point, you can do:

Names <- sprintf("r%ss%s", rep(1:5, each = 2), 1:2)

You can get a named vector with:

setNames(Values, Names)
# r1s1 r1s2 r2s1 r2s2 r3s1 r3s2 r4s1 r4s2 r5s1 r5s2 
#    0    1    2    3    4    5    6    7    8    9 

Or a named single-row data.frame with:

setNames(data.frame(t(Values)), Names)
#   r1s1 r1s2 r2s1 r2s2 r3s1 r3s2 r4s1 r4s2 r5s1 r5s2
# 1    0    1    2    3    4    5    6    7    8    9

If you have a list of your data.frames, as shared in @cyro111's answer, you can easily do the following:

do.call(rbind, lapply(myDfList, function(x) c(t(x[-1]))))
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    0    1    2    3    4    5    6    7    8     9
# [2,]   10   11   12   13   14   15   16   17   18    19

Convert to data.frame with as.data.frame and add the names with either names <- or setNames.


Generalized as a function:

myFun <- function(indf, asVec = TRUE) {
  values <- c(t(indf[-1]))
  Names <- sprintf("r%ss%s", rep(1:nrow(indf), each = ncol(indf[-1])),
                   1:ncol(indf[-1]))
  out <- setNames(values, Names)
  if (isTRUE(asVec)) out
  else (as.data.frame(as.matrix(t(out))))
}

Try it out:

myFun(myDf)        # Vector
myFun(myDf, FALSE) # data.frame

It's even more convenient on a list of data.frames.... lots of options :-)

dfList1 <- list(
  data.frame(s = 1:2, a1 = 1:2, a2 = 3:4, a3 = 5:6),
  data.frame(s = 1:2, a1 = 11:12, a2 = 31:32, a3 = 51:52)
)

lapply(dfList1, myFun)
do.call(rbind, lapply(dfList1, myFun))
t(sapply(dfList1, myFun))
as.data.frame(do.call(rbind, lapply(dfList1, myFun)))

Upvotes: 3

akrun
akrun

Reputation: 887108

You can try dcast from the devel version of data.table i.e. v1.9.5 which can take multiple value.var columns. Create two columns one with row number ('rn') and the second a grouping variable ('grp'), and use dcast. The installation details are here

library(data.table)#v1.9.5+
dcast(setDT(myDf[-1])[, c('rn1', 'grp') := list(paste0('r', 1:.N), 1)],
                   grp~rn1, value.var=c('s1', 's2'))
#   grp r1_s1 r2_s1 r3_s1 r4_s1 r5_s1 r1_s2 r2_s2 r3_s2 r4_s2 r5_s2
#1:   1     0     2     4     6     8     1     3     5     7     9

Or we can use reshape from base R

 reshape(transform(myDf, rn1=paste0('r', 1:nrow(myDf)), grp=1)[-1], 
         idvar='grp', timevar='rn1', direction='wide')
 #  grp s1.r1 s2.r1 s1.r2 s2.r2 s1.r3 s2.r3 s1.r4 s2.r4 s1.r5 s2.r5
 #1   1     0     1     2     3     4     5     6     7     8     9

Update

If we have several dataframes, we can place the datasets in a list and then use lapply with dcast or rbind the datasets in the list with rbindlist specifying a grouping variable for each dataset,then apply dcast on the whole dataset.

Using 'myOtherDF` from @Alex A.'s post

 myDFList <- list(myDf, myOtherDF)
 dcast(rbindlist(Map(cbind, myDFList, gr=seq_along(myDFList)))[,-1,
       with=FALSE][, rn1:= paste0('r', 1:.N), by=gr],
          gr~rn1, value.var=c('s1', 's2'))
 #   gr r1_s1 r2_s1 r3_s1 r4_s1 r5_s1 r1_s2 r2_s2 r3_s2 r4_s2 r5_s2
 #1:  1     0     2     4     6     8     1     3     5     7     9
 #2:  2     1     3     5     7     9     0     2     4     6     8

Upvotes: 2

Alex A.
Alex A.

Reputation: 5586

Using melt() from reshape2, you can do it like this:

library(reshape2)

# Melt the data, omitting `Seconds`
df.melted <- melt(myDF[, -1], id.vars = NULL)

# Transpose the values into a single row
myNewDF <- t(df.melted[, 2])

# Assign new variable names
colnames(myNewDF) <- paste0("r", rownames(myDF), df.melted[, 1])

#   r1s1 r2s1 r3s1 r4s1 r5s1 r1s2 r2s2 r3s2 r4s2 r5s2
# 1    0    2    4    6    8    1    3    5    7    9

This melts the data frame, uses the first column (the variable names from the original dataset) to construct the variable names for the new dataset, and uses the transpose of the second column (the data values) as the row of data.

If you want an automated approach to combining your datasets, you can take this a step further:

# Another data frame
myOtherDF <- data.frame(Seconds = seq(0, 1, 0.25),
                        s1 = seq(1, 9, 2),
                        s2 = seq(0, 8, 2))

# Turn the above steps into a function
colToRow <- function(x) {
    melted <- melt(x[, -1], id.vars = NULL)
    row <- t(melted[, 2])
    colnames(row) <- paste0("r", rownames(x), melted[, 1])
    row
}

# Create a list of the data frames to process
myDFList <- list(myDF, myOtherDF)

# Apply our function to each data frame in the list and append
myNewDF <- data.frame(do.call(rbind, lapply(myDFList, colToRow)))

#   r1s1 r2s1 r3s1 r4s1 r5s1 r1s2 r2s2 r3s2 r4s2 r5s2
# 1    0    2    4    6    8    1    3    5    7    9
# 2    1    3    5    7    9    0    2    4    6    8

Upvotes: 3

Related Questions