Reputation: 146

Count consecutive not null values from a row in R

I have a data frame with numeric data rows and I'd like to count the number of consecutive non-null values into each row and take the mean as the following example.

## Example data
dd <- data.frame(v1 = NA, v2 = 1,  v3  = 2,  v4 = 3,  v5  = NA, v6 = NA, v7 = 5,
           v8 = 4,  v9 = NA, v10 = NA, v11= NA, v12 = 6, v13 = 9, v14 = 7,
           v15 = 10)

x2 <- c(0, 1, 2, 3, NA, 1, 5, 4, NA, NA, 6, 6, 9, 7,NA)
dd <- rbind(dd, x2)
rownames(dd) <- c("id1","id2")

The rule I want to create (example for "id1") is:

#positions for v2, v3 and v4         = 3 non-null values
#positions for v7 and v8             = 2 non-null values
#positions for v12, v13, v14 and v15 = 4 non-null values

Final results

id1_non_nulls_mean = (3 + 2 + 4)/3 = 3

Thanks a lot if any help!

Upvotes: 1

Answers (2)

Señor O

Reputation: 17412

This should do it:

> dd
    v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15
id1 NA  1  2  3 NA NA  5  4 NA  NA  NA   6   9   7  10
id2  0  1  2  3 NA  1  5  4 NA  NA   6   6   9   7  NA
> apply(dd, 1, function(x) {r = rle(!is.na(x)); mean(r$lengths[r$values])})
     id1      id2 
3.000000 3.666667

edit

Using Richard's suggestion makes it much simpler and more readable:

apply(dd, 1, function(x) with(rle(!is.na(x), mean(lengths[values])))

Upvotes: 3

bramtayl

Reputation: 4024

Here's a way to do this with reshaping.

library(tidyr)
library(dplyr)

dd %>%
  add_rownames %>%
  gather(variable, value, -rowname) %>%
  group_by(rowname) %>%
  mutate(group = 
           value %>% is.na %>% `!` %>%
           `&`(value %>% lag %>% is.na) %>%
           cumsum) %>%
  filter(value %>% is.na %>% `!`) %>%
  count(rowname, group) %>%
  summarize(average_n = mean(n))

Upvotes: 0

Count consecutive not null values from a row in R

Answers (2)

Related Questions