khowsrowpaviz
khowsrowpaviz

Reputation: 333

Averaging rows with changing number of columns

My data set has 60 columns and several hundred observations. Each observation has a certain length (depending on the length of the video that has been analyzed) and the last few columns might be just NA. I want to be able to average a portion of values for each row. For example, if the video length is 15 seconds, I need to average the first 3 seconds ( a fifth of the row) and if it is 60 seconds, I need the first 12 seconds average.

obs veideolength sec1 sec2 sec3 sec4 sec5 sec6 sec7 ... sec60
obs1 10 15 251 281 249 294 278 249 ... na
obs2 5 205 182 164 178 252 na na ... na
obs3 55 157 270 277 258 233 242 181 ... na
obs4 60 169 194 154 173 237 214 257 ... 187
obs5 30 187 159 222 235 275 196 169 ... na
obs6 20 198 254 227 247 210 193 289 ... na
obs7 60 198 271 225 157 205 192 170 ... 223
obs8 25 261 240 263 230 153 267 249 ... na

I have tried rowMeans but the problem is it does not accept a variable inside its arguments.

df$average1 <-rowMeans(df[,3:(3+floor(df$videolength/5))])

I also have tried for loop, but the variable j does not update and remains the first variable that it has been assigned.

for(i in 1:nrow(df)){
  j = (floor(df$VideoLength[i]/5)-1)
  frames1$average1 <-rowMeans(df[,3:(3+j)])
}

Upvotes: 1

Views: 119

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389275

Since you want to take mean of different number of columns for each row you cannot use rowMeans directly here. Here is one way using apply in a rowwise manner.

df$average <- apply(df[-1], 1, function(x) {
  mean(x[-1][seq_len(ceiling(x[1]/5))])
})

This is assuming your second column is veideolength and everything from 3rd column are the second columns.

Upvotes: 1

Lukas Wallrich
Lukas Wallrich

Reputation: 440

Just as an alternative to mapply, I personally find the purrr solution a lot more readable, if you are open to using the tidyverse.

library(purrr)
library(dplyr)

pmap(d, function(...) {
  current <- data.frame(...)
  cols <- floor(current$videolength/5)-1
  current[3:(3+cols)] %>% unlist() %>% mean()
})

Upvotes: 0

jay.sf
jay.sf

Reputation: 73692

In an mapply, for the first argument we transpose the interesting values of the data frame, and for the second argument we make sequences of length videolength/5. With the second we subset the former and calculate mean, mapply does this subsequently.

res <- mapply(function(x, y) mean(x[y]),
              unname(as.data.frame(t(d[-(1:2)]))),  ## transpose 
              Map(seq, 1, d$videolength/5))
res
# [1] 106.0000 123.0000 178.7273 179.2500 118.6667 231.0000 176.8333 204.4000

Data:

d <- structure(list(obs = c("obs1", "obs2", "obs3", "obs4", "obs5", 
"obs6", "obs7", "obs8"), videolength = c(15, 5, 55, 60, 30, 20, 
60, 25), sec1 = c(63L, 123L, 172L, 262L, 76L, 253L, 153L, 249L
), sec2 = c(167L, 19L, 150L, 265L, 283L, 298L, 170L, 279L), sec3 = c(88L, 
226L, 160L, 171L, 138L, 266L, 288L, 152L), sec4 = c(242L, 273L, 
123L, 39L, 77L, 107L, 69L, 213L), sec5 = c(160L, 38L, 211L, 205L, 
63L, 98L, 240L, 129L), sec6 = c(136L, NA, 18L, 46L, 75L, 239L, 
264L, 96L), sec7 = c(63L, NA, 240L, 252L, 130L, 75L, 184L, 122L
), sec8 = c(142L, NA, 229L, 28L, 271L, 116L, 162L, 216L), sec9 = c(38L, 
NA, 259L, 125L, 42L, 185L, 221L, 87L), sec10 = c(103L, NA, 128L, 
276L, 16L, 164L, 203L, 161L), sec11 = c(179L, NA, 276L, 222L, 
173L, 196L, 127L, 60L), sec12 = c(124L, NA, 144L, 260L, 150L, 
69L, 41L, 261L), sec13 = c(34L, NA, 17L, 256L, 141L, 68L, 108L, 
145L), sec14 = c(103L, NA, 272L, 238L, 222L, 187L, 147L, 164L
), sec15 = c(297L, NA, 200L, 276L, 273L, 57L, 100L, 75L), sec16 = c(NA, 
NA, 152L, 228L, 243L, 221L, 128L, 34L), sec17 = c(NA, NA, 54L, 
109L, 283L, 99L, 264L, 184L), sec18 = c(NA, NA, 19L, 20L, 26L, 
167L, 183L, 31L), sec19 = c(NA, NA, 47L, 266L, 207L, 254L, 234L, 
96L), sec20 = c(NA, NA, 117L, 285L, 172L, 63L, 52L, 225L), sec21 = c(NA, 
NA, 242L, 48L, 65L, NA, 129L, 188L), sec22 = c(NA, NA, 123L, 
202L, 179L, NA, 281L, 287L), sec23 = c(NA, NA, 171L, 268L, 61L, 
NA, 234L, 108L), sec24 = c(NA, NA, 90L, 282L, 212L, NA, 227L, 
192L), sec25 = c(NA, NA, 279L, 142L, 24L, NA, 228L, 193L), sec26 = c(NA, 
NA, 49L, 55L, 269L, NA, 127L, NA), sec27 = c(NA, NA, 235L, 207L, 
215L, NA, 44L, NA), sec28 = c(NA, NA, 30L, 80L, 247L, NA, 201L, 
NA), sec29 = c(NA, NA, 234L, 166L, 17L, NA, 61L, NA), sec30 = c(NA, 
NA, 262L, 112L, 195L, NA, 119L, NA), sec31 = c(NA, NA, 132L, 
39L, NA, NA, 216L, NA), sec32 = c(NA, NA, 144L, 121L, NA, NA, 
102L, NA), sec33 = c(NA, NA, 96L, 28L, NA, NA, 282L, NA), sec34 = c(NA, 
NA, 124L, 176L, NA, NA, 195L, NA), sec35 = c(NA, NA, 163L, 138L, 
NA, NA, 24L, NA), sec36 = c(NA, NA, 71L, 208L, NA, NA, 135L, 
NA), sec37 = c(NA, NA, 114L, 109L, NA, NA, 103L, NA), sec38 = c(NA, 
NA, 105L, 46L, NA, NA, 51L, NA), sec39 = c(NA, NA, 283L, 199L, 
NA, NA, 208L, NA), sec40 = c(NA, NA, 195L, 298L, NA, NA, 177L, 
NA), sec41 = c(NA, NA, 68L, 132L, NA, NA, 297L, NA), sec42 = c(NA, 
NA, 222L, 51L, NA, NA, 60L, NA), sec43 = c(NA, NA, 260L, 152L, 
NA, NA, 294L, NA), sec44 = c(NA, NA, 74L, 275L, NA, NA, 258L, 
NA), sec45 = c(NA, NA, 299L, 49L, NA, NA, 217L, NA), sec46 = c(NA, 
NA, 122L, 92L, NA, NA, 239L, NA), sec47 = c(NA, NA, 140L, 28L, 
NA, NA, 220L, NA), sec48 = c(NA, NA, 126L, 170L, NA, NA, 107L, 
NA), sec49 = c(NA, NA, 86L, 111L, NA, NA, 186L, NA), sec50 = c(NA, 
NA, 299L, 243L, NA, NA, 267L, NA), sec51 = c(NA, NA, 15L, 196L, 
NA, NA, 149L, NA), sec52 = c(NA, NA, 155L, 141L, NA, NA, 275L, 
NA), sec53 = c(NA, NA, 220L, 57L, NA, NA, 173L, NA), sec54 = c(NA, 
NA, 56L, 194L, NA, NA, 155L, NA), sec55 = c(NA, NA, 56L, 41L, 
NA, NA, 44L, NA), sec56 = c(NA, NA, NA, 44L, NA, NA, 68L, NA), 
    sec57 = c(NA, NA, NA, 194L, NA, NA, 16L, NA), sec58 = c(NA, 
    NA, NA, 226L, NA, NA, 42L, NA), sec59 = c(NA, NA, NA, 89L, 
    NA, NA, 125L, NA), sec60 = c(NA, NA, NA, 215L, NA, NA, 226L, 
    NA)), class = "data.frame", row.names = c(NA, -8L))

Upvotes: 0

Related Questions