Sum observations from two columns, looping over many columns in R

Question

I have searched high and low, but am stuck on how to approach this. I have two sets of columns that I want to sum, row by row, but which I want to loop over many columns. If I were to do this manually, I would want:

df1[1,1]+df2[1,1]
df1[2,1]+df2[2,1]

etc... I've found many helpful examples on how to do something like:

apply(df[,c("a","d")], 1, sum)

though I want to do this over lots of columns. Also, while it's not entirely relevant, I want to phrase my question as close to my reality as possible, so my example below includes NA's, since my actual data contains many missing values.

# make a data frame, df1, with three columns
a <- sample(1:100, 50, replace = T)
b <- sample(100:300, 50, replace = T)
c <- sample(2:50, 500, replace = T)
df1 <- cbind(a,b,c)

# make another data frame, df2, with three columns
x <- sample(1:100, 50, replace = T)
y <- sample(100:300, 50, replace = T)
z <- sample(2:50, 50, replace = T)
df2 <- cbind(x,y,z)

# make another data frame, df2, with three columns
x <- sample(1:100, 50, replace = T)
y <- sample(100:300, 50, replace = T)
z <- sample(2:50, 50, replace = T)
df2 <- cbind(x,y,z)

Make it possible to randomly throw a few NAs in, function from http://www.r-bloggers.com/function-to-generate-a-random-data-set/

NAins <-  NAinsert <- function(df, prop = .1){
    n <- nrow(df)
    m <- ncol(df)
    num.to.na <- ceiling(prop*n*m)
    id <- sample(0:(m*n-1), num.to.na, replace = FALSE)
    rows <- id %/% m + 1
    cols <- id %% m + 1
    sapply(seq(num.to.na), function(x){
            df[rows[x], cols[x]] <<- NA
    }
    )
    return(df)
}

Add the NAs to the frames

NAins(df1, .2)
NAins(df2, .14)

Then, I tried to seq along the columns in each data frame, and used apply setting the index to 1, meaning to sum each row entry. This doesn't work.

for(i in seq_along(df1)){
    for(j in seq_along(df2)){ 
            apply(c(df1[,i], col2[j]), 1, function(x) sum(x, na.rm = T))}}

Thanks for any help!

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

You should be able to just replace NA with 0, and then add with "+":

replace(df1, is.na(df1), 0) + replace(df2, is.na(df2), 0)
#    X  Y  Z
# 1  7 19  6
# 2 11 12  1
# 3 16 14 11
# 4 13  7 13
# 5 10  2 11

Alternatively, if you have more than just two data.frames, you can collect them in a list and use Reduce:

Reduce("+", lapply(mget(c("df1", "df2", "df3")), function(x) replace(x, is.na(x), 0)))

Here's some sample data (and what I think is an easier way to create it):

set.seed(1) ## Set a seed so others can reproduce your sample data

dfmaker <- function() {
  setNames(
    data.frame(
      replicate(3, sample(c(NA, 1:10), 5, TRUE), FALSE)), 
    c("X", "Y", "Z"))
}

df1 <- dfmaker()
df1
#   X  Y Z
# 1 2  9 2
# 2 4 10 1
# 3 6  7 7
# 4 9  6 4
# 5 2 NA 8

df2 <- dfmaker()
df2
#    X  Y  Z
# 1  5 10  4
# 2  7  2 NA
# 3 10  7  4
# 4  4  1  9
# 5  8  2  3

df3 <- dfmaker()

Sum observations from two columns, looping over many columns in R

Answers (2)

Related Questions