John Abercrombie
John Abercrombie

Reputation: 61

Pasting multiple data frames together while keeping data in specific row and columns

So right now I have an master data frame with 230 rows and 15132 column (all labeled with specific column and row names) with the value 0 stored in each observation. I have multiple data frames with differing columns and rows that I would like to paste into this master data frame while maintaining the correct placement. Each of these data frames consist of various combinations of the row/column names of the master data frame. How would I go about doing this?

dempty <- data.frame(matrix(0, nrow = 5, ncol = 5))
rownames(dempty) <- c("v", "w", "x", "y", "z")
colnames(dempty) <- c("a", "b", "c", "d", "e")


d1 <- data.frame(matrix(1, nrow = 2, ncol = 3))
rownames(d1) <- c("x", "y")
colnames(d1) <- c("b", "c", "d")

d2 <- data.frame(matrix(2, nrow = 3, ncol = 4))
rownames(d2) <- c("v", "w", "x")
colnames(d2) <- c("a", "b", "c", "d")

d3 <- data.frame(matrix(3, nrow = 4, ncol = 2))
rownames(d3) <- c("w", "x", "y", "z")
colnames(d3) <- c("d", "e")

dfinal <- data.frame(matrix(c(2, 2, 2, 0, 0, 2, 2, "1;2", "1;2", 0, 2, 2, "1;2", "1;2", 0, 2, "2;3", "1;2;3", "1;3", 3, 0, 3, 3, 3, 3), nrow = 5, ncol= 5))
rownames(dfinal) <- c("v", "w", "x", "y", "z")
colnames(dfinal) <- c("a", "b", "c", "d", "e")

Upvotes: 0

Views: 320

Answers (2)

Joshua Mire
Joshua Mire

Reputation: 736

This loop should populate dfinal with values from df1, df2, and df3. It can be easily edited for your actual problem.

## get row an col names of master data to be populated
cols<-names(dfinal)
rows<-row.names(dfinal)
## iterate through the master data frame
for(i in 1:length(cols)){
  for(j in 1:length(rows)){
    ## if df1 contains matching col/row names use ifelse to replace
    if(cols[i] %in% names(df1) & rows[j] %in% row.names(df1)){
      ## if current row/col in dfinal is still 0 replace with df1 value, else append df1 value
      dfinal[rows[j],cols[i]] <- ifelse(dfinal[rows[j],cols[i]] == 0, df1[rows[j],cols[i]], paste0(dfinal[rows[j],cols[i]],";",df1[rows[j],cols[i]]))
    }
    ## if current row/col in dfinal is still 0 replace with df2 value, else append df2 value
    if(cols[i] %in% names(df2) & rows[j] %in% row.names(df2)){
      ## if current row/col in dfinal is still 0 replace with df2 value, else append df2 value
      dfinal[rows[j],cols[i]] <- ifelse(dfinal[rows[j],cols[i]] == 0, df2[rows[j],cols[i]], paste0(dfinal[rows[j],cols[i]],";",df2[rows[j],cols[i]]))
    }
    ## if current row/col in dfinal is still 0 replace with df3 value, else append df3 value
    if(cols[i] %in% names(df3) & rows[j] %in% row.names(df3)){
      ## if current row/col in dfinal is still 0 replace with df3 value, else append df3 value
      dfinal[rows[j],cols[i]] <- ifelse(dfinal[rows[j],cols[i]] == 0, df3[rows[j],cols[i]], paste0(dfinal[rows[j],cols[i]],";",df3[rows[j],cols[i]]))
    }
  }
}

I hope this helps!

Upvotes: 2

r2evans
r2evans

Reputation: 160417

Using this sample data:

dat <- `dimnames<-`(data.frame(matrix(0, nrow=3, ncol=5)), list(letters[1:3], LETTERS[1:5]))
dat
#   A B C D E
# a 0 0 0 0 0
# b 0 0 0 0 0
# c 0 0 0 0 0

df1 <- `rownames<-`(data.frame(B=2:3, C=22:23), c("a", "c"))
df1
#   B  C
# a 2 22
# c 3 23

You can brute-force replacements with this:

dat[,colnames(df1)] <- Map(function(x, y, rn) replace(x, rn, y),
                           dat[,colnames(df1)], df1,
                           list(match(rownames(df1), rownames(dat))))
dat
#   A B  C D E
# a 0 2 22 0 0
# b 0 0  0 0 0
# c 0 3 23 0 0

If your data has row indices (row names) as an actual column, though, it would likely be easier and more robust to merge or use one of the dplyr::*_join functions with some post-merge cleanup. Base R's merge does support merging by row name, though it feels less fluid:

newdat <- merge(dat, df1, by=0, all=TRUE)
newdat
#   Row.names A B.x C.x D E B.y C.y
# 1         a 0   2  22 0 0   2  22
# 2         b 0   0   0 0 0  NA  NA
# 3         c 0   3  23 0 0   3  23

(cleanupvars <- gsub("\\.x$", "", grep("\\.x$", colnames(newdat), value = TRUE)))
# [1] "B" "C"
for (nm in cleanupvars) {
  newdat[[nm]] <- ifelse(is.na(newdat[[paste0(nm, ".y")]]), newdat[[paste0(nm, ".x")]], newdat[[paste0(nm, ".y")]])
  newdat[,paste0(nm, c(".x", ".y"))] <- NULL
}
newdat
#   Row.names A D E B  C
# 1         a 0 0 0 2 22
# 2         b 0 0 0 0  0
# 3         c 0 0 0 3 23
newdat$Row.names <- NULL

(Column order is not maintained.)

This last method is clunky, no doubt.

Upvotes: 0

Related Questions