Reputation: 1637
I have a dataframe and I want to find the values for each subject (names), which will be the sum of a range of columns (index1:index2)
names = c('a', 'b', 'c')
index1 = c(5, 6, 4)
index2 = c(7,8,7)
time1 = c(1,1,1)
time2 = c(2,2,2)
time3 = c(3,3,3)
time4 = c(4,4,4)
time5 = c(5,5,5)
df = data.frame(names, index1, index2, time1, time2, time3, time4, time5)
names index1 index2 time1 time2 time3 time4 time5
1 a 5 7 1 2 3 4 5
2 b 6 8 1 2 3 4 5
3 c 4 7 1 2 3 4 5
Output will be:
ans = 9 12 10
[5:7] for a will be 2,3,4 = 9
[6:8] for b will be 3,4,5 = 12
[4:7] for c will be 1,2,3,4 = 10
I have a huge dataset. This is just an example. Please help!
Upvotes: 0
Views: 812
Reputation: 389055
Using mapply
we can do
mapply(function(x, y, z) sum(df[x, y:z]), seq_len(nrow(df)), df$index1, df$index2)
#[1] 9 12 10
However, instead of subsetting dataframe in mapply
, I believe splitting every row would be more efficient for bigger datasets.
mapply(function(x, y, z) sum(x[y:z]),
split(df, seq_len(nrow(df))), df$index1, df$index2)
A tidyverse
approach could be to define n
first (number of non-time
columns), gather
to long format, group_by
each names
and calculate sum
library(dplyr)
library(tidyr)
n <- 3
df %>%
gather(key, value, -(1:n)) %>%
group_by(names) %>%
summarise(sum = sum(value[(index1[1] - n):(index2[1] - n)])) %>%
pull(sum)
#[1] 9 12 10
Upvotes: 1
Reputation: 33498
An alternative:
apply(df, 1, function(x) sum(as.integer(x[x[["index1"]]:x[["index2"]]])))
Another one:
apply(df[-1], 1, function(x) sum((x[(x[["index1"]]:x[["index2"]]) - 1L])))
Upvotes: 0