rle(): Return average of lengths only if values == TRUE

Question

I have the following rle object:

Run Length Encoding
  lengths: int [1:189] 4 5 3 15 6 4 9 1 9 5 ...
  values : logi [1:189] FALSE TRUE FALSE TRUE FALSE TRUE ...

I would like to find the average (mean) of the lengths if the corresponding item in the values == TRUE (I'm not interested in the lengths when values == FALSE)

df <- data.frame(values = NoOfTradesAndLength$values, lengths = NoOfTradesAndLength$lengths)
AveLength <- aggregate(lengths ~ values, data = df, FUN = function(x) mean(x))

Which returns this:

  values  lengths
1  FALSE 7.694737
2   TRUE 5.287234

I can now obtain the length where values == TRUE but is there a nicer way of doing this? Or perhaps, could I achieve a similar result without using rle at all? It feels a bit fiddly converting from lists to dataframe and I'm sure there is a one line clever way of doing this. I've seen that derivatives of this question have cycled through before but I wasn't able to come up with anything better from those so your help is much appreciated.

akrun · Accepted Answer

The rle returns a list of 'lengths' and 'values'. We can subset the 'lengths' using the 'values' as logical index and get the mean

with(NoOfTradesAndLength, mean(lengths[values]))

Using a reproducible example

set.seed(24)
NoOfTradesAndLength <- rle(sample(c(TRUE, FALSE), 25, replace=TRUE))
with(NoOfTradesAndLength, mean(lengths[values]))
#[1] 1.5

Using the OP's code

AveLength[2,]
#  values lengths
#2   TRUE     1.5

rle(): Return average of lengths only if values == TRUE

Answers (1)

Related Questions