TYL
TYL

Reputation: 1637

Calculating range of column values for each subject in dataframe in R

I have a dataframe and I want to find the values for each subject (names), which will be the sum of a range of columns (index1:index2)

names = c('a', 'b', 'c')
index1 = c(5, 6, 4)
index2 = c(7,8,7)
time1 = c(1,1,1)
time2 = c(2,2,2)
time3 = c(3,3,3)
time4 = c(4,4,4)
time5 = c(5,5,5)
df = data.frame(names, index1, index2, time1, time2, time3, time4, time5)

    names index1 index2 time1 time2 time3 time4 time5
1     a      5      7     1     2     3     4     5
2     b      6      8     1     2     3     4     5
3     c      4      7     1     2     3     4     5

Output will be:

ans = 9 12 10

[5:7] for a will be 2,3,4 = 9

[6:8] for b will be 3,4,5 = 12

[4:7] for c will be 1,2,3,4 = 10

I have a huge dataset. This is just an example. Please help!

Upvotes: 0

Views: 812

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389055

Using mapply we can do

mapply(function(x, y, z) sum(df[x, y:z]), seq_len(nrow(df)), df$index1, df$index2)
#[1]  9 12 10

However, instead of subsetting dataframe in mapply, I believe splitting every row would be more efficient for bigger datasets.

mapply(function(x, y, z) sum(x[y:z]), 
        split(df, seq_len(nrow(df))), df$index1, df$index2)

A tidyverse approach could be to define n first (number of non-time columns), gather to long format, group_by each names and calculate sum

library(dplyr)
library(tidyr)

n <- 3

df %>%
  gather(key, value, -(1:n)) %>%
  group_by(names) %>%
  summarise(sum = sum(value[(index1[1] - n):(index2[1] - n)])) %>%
  pull(sum)

#[1]  9 12 10

Upvotes: 1

s_baldur
s_baldur

Reputation: 33498

An alternative:

apply(df, 1, function(x) sum(as.integer(x[x[["index1"]]:x[["index2"]]])))

Another one:

apply(df[-1], 1, function(x) sum((x[(x[["index1"]]:x[["index2"]]) - 1L])))

Upvotes: 0

Related Questions