Reputation: 317
pixel230 pixel231 pixel232 pixel233 pixel234 pixel235...
61 191 254 254 254 254 ...
0 0 0 0 0 0 ...
0 160 207 6 0 0 ...
23 210 253 253 253 248 ...
0 0 0 8 211 254 ...
I want to sum over cols skipping some in between and add a new column. There are actually pixel0 through pixel783 and 600000rows. Need something similar to:
mutate(df,eureka=sum([pixel230:pixel239]+[pixel244:pixel252])
How do I index the df to get this to work? Thanks.
Upvotes: 3
Views: 2746
Reputation: 887048
An option using data.table
. Specify the columns (.SDcols
) that we need to get the sum
('nm1'), use Reduce
to sum the corresponding elements of those columns, assign (:=
) the output to new column ('eureka') (should be very fast for big datasets as it add columns by reference)
library(data.table)
nm1 <-paste0('pixel', c(230:231, 234:235))
setDT(df)[, eureka:=Reduce(`+`,.SD), .SDcols=nm1][]
# pixel230 pixel231 pixel232 pixel233 pixel234 pixel235 eureka
#1: 61 191 254 254 254 254 760
#2: 0 0 0 0 0 0 0
#3: 0 160 207 6 0 0 160
#4: 23 210 253 253 253 248 734
#5: 0 0 0 8 211 254 465
Suppose if there are missing values (NA
) in some of the columns, can replace it with '0' and do the +
df[3,1] <- NA
setDT(df)[, eureka := Reduce(`+`, lapply(.SD, function(x)
replace(x, which(is.na(x)), 0))), .SDcols=nm1][]
a similar approach using base R
is
df['eureka'] <- Reduce(`+`,lapply(df[nm1], function(x)
replace(x, which(is.na(x)), 0)))
df <- structure(list(pixel230 = c(61L, 0L, 0L, 23L, 0L),
pixel231 = c(191L, 0L, 160L, 210L, 0L), pixel232 = c(254L, 0L, 207L,
253L, 0L), pixel233 = c(254L, 0L, 6L, 253L, 8L), pixel234 = c(254L,
0L, 0L, 253L, 211L), pixel235 = c(254L, 0L, 0L, 248L, 254L
)), .Names = c("pixel230", "pixel231", "pixel232", "pixel233",
"pixel234", "pixel235"), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 1
Reputation: 70266
Here's a straightforward base R approach using rowSums:
df$eureka <- rowSums(df[paste0('pixel', c(230:239, 244:252))], na.rm = TRUE)
Edited to include na.rm = TRUE
so you can also sum over columns that include NA entries.
The paste0('pixel', c(230:239, 244:252))
creates a vector of those column names you want to use for calculating the row sums. Because you supply that vector to df[...]
, the data is subsetted to only those columns for the rowSums, but all original columns remain in the "final" output + the new column.
Upvotes: 0
Reputation: 206197
You could use select()
to get the columns you want, then use rowSums()
to add them up. For example
library(dplyr)
#reproducible example
set.seed(15)
dd <- data.frame(matrix(rpois(7*4,5), ncol=7))
dd %>% select(num_range("X", c(1:2, 5:7))) %>% mutate(eureka=rowSums(.))
This gives
X1 X2 X5 X6 X7 eureka
1 5 4 5 7 5 26
2 3 11 9 7 6 36
3 9 7 3 4 2 25
4 6 3 7 2 3 21
To preserve all the variables, you can make a helper function
sum_num_range <- function(x, val, rng, name) {
select(x,num_range(val, rng)) %>%
transmute_(.dots=setNames(list(~rowSums(.)), name)) %>%
cbind(x,.)
}
dd %>% sum_num_range("X", c(1:2, 5:7), "eureka")
which returns
X1 X2 X3 X4 X5 X6 X7 eureka
1 5 4 6 5 5 7 5 26
2 3 11 7 6 9 7 6 36
3 9 7 2 7 3 4 2 25
4 6 3 6 7 7 2 3 21
or maybe more generally
sum_cols <- function(x, colspec, name) {
select_(x, .dots=list(lazyeval::lazy(colspec))) %>%
transmute_(.dots=setNames(list(~rowSums(.)), name)) %>%
cbind(x,.)
}
dd %>% sum_cols(num_range("X", c(1:2, 5:7)), "eureka")
which should work with the other select()
helper functions such as starts_with()
, ends_with()
, everything()
, etc. See the ?select
help page for more info.
Upvotes: 3