Reputation: 333
My data set has 60 columns and several hundred observations. Each observation has a certain length (depending on the length of the video that has been analyzed) and the last few columns might be just NA. I want to be able to average a portion of values for each row. For example, if the video length is 15 seconds, I need to average the first 3 seconds ( a fifth of the row) and if it is 60 seconds, I need the first 12 seconds average.
obs | veideolength | sec1 | sec2 | sec3 | sec4 | sec5 | sec6 | sec7 | ... | sec60 |
---|---|---|---|---|---|---|---|---|---|---|
obs1 | 10 | 15 | 251 | 281 | 249 | 294 | 278 | 249 | ... | na |
obs2 | 5 | 205 | 182 | 164 | 178 | 252 | na | na | ... | na |
obs3 | 55 | 157 | 270 | 277 | 258 | 233 | 242 | 181 | ... | na |
obs4 | 60 | 169 | 194 | 154 | 173 | 237 | 214 | 257 | ... | 187 |
obs5 | 30 | 187 | 159 | 222 | 235 | 275 | 196 | 169 | ... | na |
obs6 | 20 | 198 | 254 | 227 | 247 | 210 | 193 | 289 | ... | na |
obs7 | 60 | 198 | 271 | 225 | 157 | 205 | 192 | 170 | ... | 223 |
obs8 | 25 | 261 | 240 | 263 | 230 | 153 | 267 | 249 | ... | na |
… |
I have tried rowMeans
but the problem is it does not accept a variable inside its arguments.
df$average1 <-rowMeans(df[,3:(3+floor(df$videolength/5))])
I also have tried for loop, but the variable j
does not update and remains the first variable that it has been assigned.
for(i in 1:nrow(df)){
j = (floor(df$VideoLength[i]/5)-1)
frames1$average1 <-rowMeans(df[,3:(3+j)])
}
Upvotes: 1
Views: 119
Reputation: 389275
Since you want to take mean of different number of columns for each row you cannot use rowMeans
directly here. Here is one way using apply
in a rowwise manner.
df$average <- apply(df[-1], 1, function(x) {
mean(x[-1][seq_len(ceiling(x[1]/5))])
})
This is assuming your second column is veideolength
and everything from 3rd column are the second columns.
Upvotes: 1
Reputation: 440
Just as an alternative to mapply
, I personally find the purrr
solution a lot more readable, if you are open to using the tidyverse.
library(purrr)
library(dplyr)
pmap(d, function(...) {
current <- data.frame(...)
cols <- floor(current$videolength/5)-1
current[3:(3+cols)] %>% unlist() %>% mean()
})
Upvotes: 0
Reputation: 73692
In an mapply
, for the first argument we transpose the interesting values of the data frame, and for the second argument we make seq
uences of length videolength/5
. With the second we subset the former and calculate mean
, mapply
does this subsequently.
res <- mapply(function(x, y) mean(x[y]),
unname(as.data.frame(t(d[-(1:2)]))), ## transpose
Map(seq, 1, d$videolength/5))
res
# [1] 106.0000 123.0000 178.7273 179.2500 118.6667 231.0000 176.8333 204.4000
Data:
d <- structure(list(obs = c("obs1", "obs2", "obs3", "obs4", "obs5",
"obs6", "obs7", "obs8"), videolength = c(15, 5, 55, 60, 30, 20,
60, 25), sec1 = c(63L, 123L, 172L, 262L, 76L, 253L, 153L, 249L
), sec2 = c(167L, 19L, 150L, 265L, 283L, 298L, 170L, 279L), sec3 = c(88L,
226L, 160L, 171L, 138L, 266L, 288L, 152L), sec4 = c(242L, 273L,
123L, 39L, 77L, 107L, 69L, 213L), sec5 = c(160L, 38L, 211L, 205L,
63L, 98L, 240L, 129L), sec6 = c(136L, NA, 18L, 46L, 75L, 239L,
264L, 96L), sec7 = c(63L, NA, 240L, 252L, 130L, 75L, 184L, 122L
), sec8 = c(142L, NA, 229L, 28L, 271L, 116L, 162L, 216L), sec9 = c(38L,
NA, 259L, 125L, 42L, 185L, 221L, 87L), sec10 = c(103L, NA, 128L,
276L, 16L, 164L, 203L, 161L), sec11 = c(179L, NA, 276L, 222L,
173L, 196L, 127L, 60L), sec12 = c(124L, NA, 144L, 260L, 150L,
69L, 41L, 261L), sec13 = c(34L, NA, 17L, 256L, 141L, 68L, 108L,
145L), sec14 = c(103L, NA, 272L, 238L, 222L, 187L, 147L, 164L
), sec15 = c(297L, NA, 200L, 276L, 273L, 57L, 100L, 75L), sec16 = c(NA,
NA, 152L, 228L, 243L, 221L, 128L, 34L), sec17 = c(NA, NA, 54L,
109L, 283L, 99L, 264L, 184L), sec18 = c(NA, NA, 19L, 20L, 26L,
167L, 183L, 31L), sec19 = c(NA, NA, 47L, 266L, 207L, 254L, 234L,
96L), sec20 = c(NA, NA, 117L, 285L, 172L, 63L, 52L, 225L), sec21 = c(NA,
NA, 242L, 48L, 65L, NA, 129L, 188L), sec22 = c(NA, NA, 123L,
202L, 179L, NA, 281L, 287L), sec23 = c(NA, NA, 171L, 268L, 61L,
NA, 234L, 108L), sec24 = c(NA, NA, 90L, 282L, 212L, NA, 227L,
192L), sec25 = c(NA, NA, 279L, 142L, 24L, NA, 228L, 193L), sec26 = c(NA,
NA, 49L, 55L, 269L, NA, 127L, NA), sec27 = c(NA, NA, 235L, 207L,
215L, NA, 44L, NA), sec28 = c(NA, NA, 30L, 80L, 247L, NA, 201L,
NA), sec29 = c(NA, NA, 234L, 166L, 17L, NA, 61L, NA), sec30 = c(NA,
NA, 262L, 112L, 195L, NA, 119L, NA), sec31 = c(NA, NA, 132L,
39L, NA, NA, 216L, NA), sec32 = c(NA, NA, 144L, 121L, NA, NA,
102L, NA), sec33 = c(NA, NA, 96L, 28L, NA, NA, 282L, NA), sec34 = c(NA,
NA, 124L, 176L, NA, NA, 195L, NA), sec35 = c(NA, NA, 163L, 138L,
NA, NA, 24L, NA), sec36 = c(NA, NA, 71L, 208L, NA, NA, 135L,
NA), sec37 = c(NA, NA, 114L, 109L, NA, NA, 103L, NA), sec38 = c(NA,
NA, 105L, 46L, NA, NA, 51L, NA), sec39 = c(NA, NA, 283L, 199L,
NA, NA, 208L, NA), sec40 = c(NA, NA, 195L, 298L, NA, NA, 177L,
NA), sec41 = c(NA, NA, 68L, 132L, NA, NA, 297L, NA), sec42 = c(NA,
NA, 222L, 51L, NA, NA, 60L, NA), sec43 = c(NA, NA, 260L, 152L,
NA, NA, 294L, NA), sec44 = c(NA, NA, 74L, 275L, NA, NA, 258L,
NA), sec45 = c(NA, NA, 299L, 49L, NA, NA, 217L, NA), sec46 = c(NA,
NA, 122L, 92L, NA, NA, 239L, NA), sec47 = c(NA, NA, 140L, 28L,
NA, NA, 220L, NA), sec48 = c(NA, NA, 126L, 170L, NA, NA, 107L,
NA), sec49 = c(NA, NA, 86L, 111L, NA, NA, 186L, NA), sec50 = c(NA,
NA, 299L, 243L, NA, NA, 267L, NA), sec51 = c(NA, NA, 15L, 196L,
NA, NA, 149L, NA), sec52 = c(NA, NA, 155L, 141L, NA, NA, 275L,
NA), sec53 = c(NA, NA, 220L, 57L, NA, NA, 173L, NA), sec54 = c(NA,
NA, 56L, 194L, NA, NA, 155L, NA), sec55 = c(NA, NA, 56L, 41L,
NA, NA, 44L, NA), sec56 = c(NA, NA, NA, 44L, NA, NA, 68L, NA),
sec57 = c(NA, NA, NA, 194L, NA, NA, 16L, NA), sec58 = c(NA,
NA, NA, 226L, NA, NA, 42L, NA), sec59 = c(NA, NA, NA, 89L,
NA, NA, 125L, NA), sec60 = c(NA, NA, NA, 215L, NA, NA, 226L,
NA)), class = "data.frame", row.names = c(NA, -8L))
Upvotes: 0