Steve
Steve

Reputation: 5977

Paste together two dataframes element by element

I need to paste, element by element, the contents of two data frames for input to another program. I have a data frame of means and a data frame of standard errors of the mean.

I tried using the R paste() function, but it doesn't seem to be able to cope with data frames. When using a vector, it seems to concatenate all the elements of the first vector into a string and all the elements of the second into a separate string. Instead, I need each reciprocal element in the two data frames to be concatenated together.

Any suggestions for how to approach this? I've included dummy input data (datMean and datSE) and my desired output (datNew). My real data frames are about 10 rows by 150 columns in size.

# means and SEM
datMean <- data.frame(a=rnorm(10, 3), b=rnorm(10, 3), d=rnorm(10, 3))
datSE <- data.frame(a=rnorm(10, 3)/100, b=rnorm(10, 3)/100, d=rnorm(10, 3)/100)

# what the output should look like
# i've chosen some arbitrary values here, and show only the first row. 
datNew <- data.frame(a="2.889-2.926", b="1.342-1.389", d="2.569-2.576")

The idea is for each element in datNew to be a range consisting of 'mean - se' and 'mean + se', separated by a dash '-' . The paste() function can do this for one element, how to do this over the whole dataframe?

paste(datMean[1,1] - datSE[1,1], datMean[1,1] + datSE[1,1], sep="-")

EDIT 1: Looking at some of the answers I realize I left out an important bit of information in the question. Each row of the original data frames is named, and I need to reconstitute the final data frame with these names. For example:

rownames(datMean) <- LETTERS[1:10]
rownames(datSE) <- LETTERS[1:10]

I need datNew to eventually have these 10 rownames again. This could be problematic with some of the solutions using melt().

Upvotes: 10

Views: 11463

Answers (5)

zx8754
zx8754

Reputation: 56159

Using mapply to paste and cbind to keep rownames:

x <- cbind(
  datMean[, 0],
  mapply(paste, round(datMean - datSE, 3), round(datMean + datSE, 3), sep = " - "))

x
#               a             b             d
# A 3.268 - 3.321 5.226 - 5.308   2.3 - 2.358
# B 3.795 - 3.874 1.772 - 1.833 2.265 - 2.335
# C 1.305 - 1.346 1.238 - 1.291 2.812 - 2.874
# D 1.957 - 2.041 3.016 - 3.057 2.402 - 2.473
# E  4.73 - 4.786 2.909 - 2.963 2.245 - 2.297
# F 3.511 - 3.554 3.547 - 3.603 2.316 - 2.374
# G 3.601 - 3.689 3.073 - 3.144 3.145 - 3.215
# H 2.056 - 2.118  2.597 - 2.69  2.58 - 2.627
# I 1.802 - 1.835 2.794 - 2.895   2.452 - 2.5
# J 2.399 - 2.461 1.807 - 1.844 3.199 - 3.254

class(x)
# [1] "data.frame"
identical(rownames(datMean), rownames(x))
# [1] TRUE

Upvotes: 0

Aaron - mostly inactive
Aaron - mostly inactive

Reputation: 37754

If you convert to matrices first, you can do it with no applies or loops at all.

MdatMean <- as.matrix(datMean)
MdatSE <- as.matrix(datSE)
matrix( paste(MdatMean - MdatSE, MdatMean + MdatSE, sep="-"), 
        nrow=nrow(MdatMean), dimnames=dimnames(MdatMean) )

You also might consider formatC for better formatting.

lo <- formatC(MdatMean - MdatSE, format="f", digits=3)
hi <- formatC(MdatMean + MdatSE, format="f", digits=3)
matrix( paste(lo, hi, sep="-"), 
        nrow=nrow(MdatMean), dimnames=dimnames(MdatMean) )

If you want a data.frame in the end just wrap the last line in as.data.frame.

Upvotes: 14

mdsumner
mdsumner

Reputation: 29477

You can do this on every row at once, but you are applying to paired columns between two data.frames. Since you have a specific paste job to do each time, define the function:

pfun <- function(x, y) paste(x - y, x + y, sep = "-")

and then construct the new data.frame with the function:

 datNew <- data.frame(a = pfun(datMean$a, datSE$a), b = pfun(datMean$b, datSE$b), d = pfun(datMean$d, datSE$d))

There would be terser ways to apply this, but perhaps that helps you understand better. You can pass whole columns to paste, but not whole data.frames.

Use a loop to match all columns in the result without specifying them individually.

First create a list to store all the columns, we will convert to data.frame with the right column names.

datNew <- vector("list", ncol(datMean))

The naming does assume that column number, names and order are an exact match between the two input data.frames.

names(datNew) <- names(datMean)

for (i in 1:ncol(datMean)) {
    datNew[[i]] <- pfun(datMean[[i]], datSE[[i]])
}

Convert to data.frame:

datNew <- as.data.frame(datNew)

Upvotes: 2

Roman Luštrik
Roman Luštrik

Reputation: 70643

Here's how I understand your problem. I melted the data for means and SE from multiple columns to one column using reshape2::melt.

library(reshape2)
datMean <- melt(datMean)$value
datSE <- melt(datSE)$value
dat <- cbind(datMean, datSE)

apply(X = dat, MARGIN = 1, FUN = function(x) {
            paste(x[1] - x[2], x[1] + x[2], sep = " - ")
        })

And the result

 [1] "3.03886802467251 - 3.08551547263516" 
 [2] "3.01803172559258 - 3.05247871975711" 
 [3] "3.4609230722069 - 3.56097173966387"  
 [4] "1.35368243309618 - 1.45548512578821" 
 [5] "2.39936853846605 - 2.47570756724791" 
 [6] "3.21849170272184 - 3.29653660329785" 

EDIT

This solution respects your original data dimensions. What I do is make a 3D array and work on each cell at a time with holding the third dimension ([x,y, 1:2]) constant.

dat <- array(c(datMean, datSE), dim = c(10, 3, 2))

datNEW <- matrix(rep(NA, nrow(dat)*ncol(dat)), ncol = ncol(dat))

for (column in seq(ncol(dat))) {
    cls <- rep(NA, nrow(dat))
    for (rows in seq(nrow(dat))) {
        tmp <- dat[rows, column, 1:2]
        cls[rows] <- paste(tmp[1] - tmp[2], tmp[1] + tmp[2], sep = " - ")
    }
    datNEW[, column] <- cls
}

Upvotes: 3

Sacha Epskamp
Sacha Epskamp

Reputation: 47551

Here is a way to do this without manually specifying each column. First we make the data and put them in an array using the abind package, rounding to 3 because that looks better:

datMean <- data.frame(a=rnorm(10, 3), b=rnorm(10, 3), d=rnorm(10, 3))
datSE <- data.frame(a=rnorm(10, 3)/100, b=rnorm(10, 3)/100, d=rnorm(10, 3)/100)

library(abind)

datArray <- round(abind(datMean,datSE,along=3),3)

Then we can apply the paste function to each element and column of this array:

apply(datArray,1:2,function(x)paste(x[1]-x[2],"-",x[1]+x[2]))

      a               b               d              
 [1,] "3.537 - 3.581" "3.358 - 3.436" "3.282 - 3.312"
 [2,] "2.452 - 2.516" "1.372 - 1.44"  "3.041 - 3.127"
 [3,] "3.017 - 3.101" "3.14 - 3.228"  "5.238 - 5.258"
 [4,] "3.397 - 3.451" "2.783 - 2.839" "3.381 - 3.405"
 [5,] "1.918 - 1.988" "2.978 - 3.02"  "3.44 - 3.504" 
 [6,] "4.01 - 4.078"  "3.014 - 3.068" "1.914 - 1.954"
 [7,] "3.475 - 3.517" "2.117 - 2.159" "1.871 - 1.929"
 [8,] "2.551 - 2.619" "3.907 - 3.975" "1.588 - 1.614"
 [9,] "1.707 - 1.765" "2.63 - 2.678"  "1.316 - 1.348"
[10,] "4.051 - 4.103" "3.532 - 3.628" "3.235 - 3.287"

Upvotes: 8

Related Questions