My data set has 60 columns and several hundred observations. Each observation has a certain length (depending on the length of the video that has been analyzed) and the last few columns might be just NA. I want to be able to average a portion of values for each row. For example, if the video length is 15 seconds, I need to average the first 3 seconds ( a fifth of the row) and if it is 60 seconds, I need the first 12 seconds average. obs veideolength sec1 sec2 sec3 sec4 sec5 sec6 sec7 ... sec60 obs1 10 15 251 281 249 294 278 249 ... na obs2 5 205 182 164 178 252 na na ... na obs3 55 157 270 277 258 233 242 181 ... na obs4 60 169 194 154 173 237 214 257 ... 187 obs5 30 187 159 222 235 275 196 169 ... na obs6 20 198 254 227 247 210 193 289 ... na obs7 60 198 271 225 157 205 192 170 ... 223 obs8 25 261 240 263 230 153 267 249 ... na … I have tried rowMeans but the problem is it does not accept a variable inside its arguments. df$average1 <-rowMeans(df[,3:(3+floor(df$videolength/5))]) I also have tried for loop, but the variable j does not update and remains the first variable that it has been assigned. for(i in 1:nrow(df)){ j = (floor(df$VideoLength[i]/5)-1) frames1$average1 <-rowMeans(df[,3:(3+j)]) }

Reputation: 333

Averaging rows with changing number of columns

My data set has 60 columns and several hundred observations. Each observation has a certain length (depending on the length of the video that has been analyzed) and the last few columns might be just NA. I want to be able to average a portion of values for each row. For example, if the video length is 15 seconds, I need to average the first 3 seconds ( a fifth of the row) and if it is 60 seconds, I need the first 12 seconds average.

obs	veideolength	sec1	sec2	sec3	sec4	sec5	sec6	sec7	...	sec60
obs1	10	15	251	281	249	294	278	249	...	na
obs2	5	205	182	164	178	252	na	na	...	na
obs3	55	157	270	277	258	233	242	181	...	na
obs4	60	169	194	154	173	237	214	257	...	187
obs5	30	187	159	222	235	275	196	169	...	na
obs6	20	198	254	227	247	210	193	289	...	na
obs7	60	198	271	225	157	205	192	170	...	223
obs8	25	261	240	263	230	153	267	249	...	na
…

I have tried rowMeans but the problem is it does not accept a variable inside its arguments.

df$average1 <-rowMeans(df[,3:(3+floor(df$videolength/5))])

I also have tried for loop, but the variable j does not update and remains the first variable that it has been assigned.

for(i in 1:nrow(df)){
  j = (floor(df$VideoLength[i]/5)-1)
  frames1$average1 <-rowMeans(df[,3:(3+j)])
}

Upvotes: 1

Answers (3)

Ronak Shah

Reputation: 389275

Since you want to take mean of different number of columns for each row you cannot use rowMeans directly here. Here is one way using apply in a rowwise manner.

df$average <- apply(df[-1], 1, function(x) {
  mean(x[-1][seq_len(ceiling(x[1]/5))])
})

This is assuming your second column is veideolength and everything from 3rd column are the second columns.

Upvotes: 1

Lukas Wallrich

Reputation: 440

Just as an alternative to mapply, I personally find the purrr solution a lot more readable, if you are open to using the tidyverse.

library(purrr)
library(dplyr)

pmap(d, function(...) {
  current <- data.frame(...)
  cols <- floor(current$videolength/5)-1
  current[3:(3+cols)] %>% unlist() %>% mean()
})

Upvotes: 0

jay.sf

Reputation: 73692

In an mapply, for the first argument we transpose the interesting values of the data frame, and for the second argument we make sequences of length videolength/5. With the second we subset the former and calculate mean, mapply does this subsequently.

res <- mapply(function(x, y) mean(x[y]),
              unname(as.data.frame(t(d[-(1:2)]))),  ## transpose 
              Map(seq, 1, d$videolength/5))
res
# [1] 106.0000 123.0000 178.7273 179.2500 118.6667 231.0000 176.8333 204.4000

Data:

d <- structure(list(obs = c("obs1", "obs2", "obs3", "obs4", "obs5", 
"obs6", "obs7", "obs8"), videolength = c(15, 5, 55, 60, 30, 20, 
60, 25), sec1 = c(63L, 123L, 172L, 262L, 76L, 253L, 153L, 249L
), sec2 = c(167L, 19L, 150L, 265L, 283L, 298L, 170L, 279L), sec3 = c(88L, 
226L, 160L, 171L, 138L, 266L, 288L, 152L), sec4 = c(242L, 273L, 
123L, 39L, 77L, 107L, 69L, 213L), sec5 = c(160L, 38L, 211L, 205L, 
63L, 98L, 240L, 129L), sec6 = c(136L, NA, 18L, 46L, 75L, 239L, 
264L, 96L), sec7 = c(63L, NA, 240L, 252L, 130L, 75L, 184L, 122L
), sec8 = c(142L, NA, 229L, 28L, 271L, 116L, 162L, 216L), sec9 = c(38L, 
NA, 259L, 125L, 42L, 185L, 221L, 87L), sec10 = c(103L, NA, 128L, 
276L, 16L, 164L, 203L, 161L), sec11 = c(179L, NA, 276L, 222L, 
173L, 196L, 127L, 60L), sec12 = c(124L, NA, 144L, 260L, 150L, 
69L, 41L, 261L), sec13 = c(34L, NA, 17L, 256L, 141L, 68L, 108L, 
145L), sec14 = c(103L, NA, 272L, 238L, 222L, 187L, 147L, 164L
), sec15 = c(297L, NA, 200L, 276L, 273L, 57L, 100L, 75L), sec16 = c(NA, 
NA, 152L, 228L, 243L, 221L, 128L, 34L), sec17 = c(NA, NA, 54L, 
109L, 283L, 99L, 264L, 184L), sec18 = c(NA, NA, 19L, 20L, 26L, 
167L, 183L, 31L), sec19 = c(NA, NA, 47L, 266L, 207L, 254L, 234L, 
96L), sec20 = c(NA, NA, 117L, 285L, 172L, 63L, 52L, 225L), sec21 = c(NA, 
NA, 242L, 48L, 65L, NA, 129L, 188L), sec22 = c(NA, NA, 123L, 
202L, 179L, NA, 281L, 287L), sec23 = c(NA, NA, 171L, 268L, 61L, 
NA, 234L, 108L), sec24 = c(NA, NA, 90L, 282L, 212L, NA, 227L, 
192L), sec25 = c(NA, NA, 279L, 142L, 24L, NA, 228L, 193L), sec26 = c(NA, 
NA, 49L, 55L, 269L, NA, 127L, NA), sec27 = c(NA, NA, 235L, 207L, 
215L, NA, 44L, NA), sec28 = c(NA, NA, 30L, 80L, 247L, NA, 201L, 
NA), sec29 = c(NA, NA, 234L, 166L, 17L, NA, 61L, NA), sec30 = c(NA, 
NA, 262L, 112L, 195L, NA, 119L, NA), sec31 = c(NA, NA, 132L, 
39L, NA, NA, 216L, NA), sec32 = c(NA, NA, 144L, 121L, NA, NA, 
102L, NA), sec33 = c(NA, NA, 96L, 28L, NA, NA, 282L, NA), sec34 = c(NA, 
NA, 124L, 176L, NA, NA, 195L, NA), sec35 = c(NA, NA, 163L, 138L, 
NA, NA, 24L, NA), sec36 = c(NA, NA, 71L, 208L, NA, NA, 135L, 
NA), sec37 = c(NA, NA, 114L, 109L, NA, NA, 103L, NA), sec38 = c(NA, 
NA, 105L, 46L, NA, NA, 51L, NA), sec39 = c(NA, NA, 283L, 199L, 
NA, NA, 208L, NA), sec40 = c(NA, NA, 195L, 298L, NA, NA, 177L, 
NA), sec41 = c(NA, NA, 68L, 132L, NA, NA, 297L, NA), sec42 = c(NA, 
NA, 222L, 51L, NA, NA, 60L, NA), sec43 = c(NA, NA, 260L, 152L, 
NA, NA, 294L, NA), sec44 = c(NA, NA, 74L, 275L, NA, NA, 258L, 
NA), sec45 = c(NA, NA, 299L, 49L, NA, NA, 217L, NA), sec46 = c(NA, 
NA, 122L, 92L, NA, NA, 239L, NA), sec47 = c(NA, NA, 140L, 28L, 
NA, NA, 220L, NA), sec48 = c(NA, NA, 126L, 170L, NA, NA, 107L, 
NA), sec49 = c(NA, NA, 86L, 111L, NA, NA, 186L, NA), sec50 = c(NA, 
NA, 299L, 243L, NA, NA, 267L, NA), sec51 = c(NA, NA, 15L, 196L, 
NA, NA, 149L, NA), sec52 = c(NA, NA, 155L, 141L, NA, NA, 275L, 
NA), sec53 = c(NA, NA, 220L, 57L, NA, NA, 173L, NA), sec54 = c(NA, 
NA, 56L, 194L, NA, NA, 155L, NA), sec55 = c(NA, NA, 56L, 41L, 
NA, NA, 44L, NA), sec56 = c(NA, NA, NA, 44L, NA, NA, 68L, NA), 
    sec57 = c(NA, NA, NA, 194L, NA, NA, 16L, NA), sec58 = c(NA, 
    NA, NA, 226L, NA, NA, 42L, NA), sec59 = c(NA, NA, NA, 89L, 
    NA, NA, 125L, NA), sec60 = c(NA, NA, NA, 215L, NA, NA, 226L, 
    NA)), class = "data.frame", row.names = c(NA, -8L))

Upvotes: 0

Averaging rows with changing number of columns

Answers (3)

Related Questions