mbs1
mbs1

Reputation: 317

Summing over columns using mutate

pixel230    pixel231    pixel232    pixel233    pixel234    pixel235...               
61           191         254         254         254        254  ...
0              0          0            0           0        0   ...
0            160         207           6           0        0   ...
23           210         253         253         253       248  ...
0             0           0            8         211       254    ...

I want to sum over cols skipping some in between and add a new column. There are actually pixel0 through pixel783 and 600000rows. Need something similar to:

mutate(df,eureka=sum([pixel230:pixel239]+[pixel244:pixel252])

How do I index the df to get this to work? Thanks.

Upvotes: 3

Views: 2746

Answers (3)

akrun
akrun

Reputation: 887048

An option using data.table. Specify the columns (.SDcols) that we need to get the sum ('nm1'), use Reduce to sum the corresponding elements of those columns, assign (:=) the output to new column ('eureka') (should be very fast for big datasets as it add columns by reference)

library(data.table)
nm1 <-paste0('pixel', c(230:231, 234:235))
setDT(df)[, eureka:=Reduce(`+`,.SD), .SDcols=nm1][]
#    pixel230 pixel231 pixel232 pixel233 pixel234 pixel235 eureka
#1:       61      191      254      254      254      254    760
#2:        0        0        0        0        0        0      0
#3:        0      160      207        6        0        0    160
#4:       23      210      253      253      253      248    734
#5:        0        0        0        8      211      254    465

Suppose if there are missing values (NA) in some of the columns, can replace it with '0' and do the +

  df[3,1] <- NA
  setDT(df)[, eureka := Reduce(`+`, lapply(.SD, function(x)
             replace(x, which(is.na(x)), 0))), .SDcols=nm1][]

a similar approach using base R is

   df['eureka'] <- Reduce(`+`,lapply(df[nm1], function(x) 
                        replace(x, which(is.na(x)), 0)))

data

df <- structure(list(pixel230 = c(61L, 0L, 0L, 23L, 0L),
pixel231 =   c(191L, 0L, 160L, 210L, 0L), pixel232 = c(254L, 0L, 207L, 
253L, 0L), pixel233 = c(254L, 0L, 6L, 253L, 8L), pixel234 = c(254L, 
0L, 0L, 253L, 211L), pixel235 = c(254L, 0L, 0L, 248L, 254L
)), .Names = c("pixel230", "pixel231", "pixel232", "pixel233", 
"pixel234", "pixel235"), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 1

talat
talat

Reputation: 70266

Here's a straightforward base R approach using rowSums:

df$eureka <- rowSums(df[paste0('pixel', c(230:239, 244:252))], na.rm = TRUE)

Edited to include na.rm = TRUE so you can also sum over columns that include NA entries.

The paste0('pixel', c(230:239, 244:252)) creates a vector of those column names you want to use for calculating the row sums. Because you supply that vector to df[...] , the data is subsetted to only those columns for the rowSums, but all original columns remain in the "final" output + the new column.

Upvotes: 0

MrFlick
MrFlick

Reputation: 206197

You could use select() to get the columns you want, then use rowSums() to add them up. For example

library(dplyr)

#reproducible example
set.seed(15)
dd <- data.frame(matrix(rpois(7*4,5), ncol=7))

dd %>% select(num_range("X", c(1:2, 5:7))) %>% mutate(eureka=rowSums(.))

This gives

  X1 X2 X5 X6 X7 eureka
1  5  4  5  7  5     26
2  3 11  9  7  6     36
3  9  7  3  4  2     25
4  6  3  7  2  3     21

To preserve all the variables, you can make a helper function

sum_num_range <- function(x, val, rng, name) {
    select(x,num_range(val, rng)) %>% 
        transmute_(.dots=setNames(list(~rowSums(.)), name)) %>% 
        cbind(x,.)
}

dd %>% sum_num_range("X", c(1:2, 5:7), "eureka")

which returns

  X1 X2 X3 X4 X5 X6 X7 eureka
1  5  4  6  5  5  7  5     26
2  3 11  7  6  9  7  6     36
3  9  7  2  7  3  4  2     25
4  6  3  6  7  7  2  3     21

or maybe more generally

sum_cols <- function(x, colspec, name) {
    select_(x, .dots=list(lazyeval::lazy(colspec))) %>% 
    transmute_(.dots=setNames(list(~rowSums(.)), name)) %>% 
    cbind(x,.)
}

dd %>% sum_cols(num_range("X", c(1:2, 5:7)), "eureka")

which should work with the other select() helper functions such as starts_with(), ends_with(), everything(), etc. See the ?select help page for more info.

Upvotes: 3

Related Questions