Reputation: 406
i'm trying to get a specific count from a previously created result set. I need a count of rows between rows that contain NA's. An aggregation of values of these rows are not of interest, only the count.
Below a quite simplified example which hopefully better explains what i'm talking about. At left hand the actual data and at right hand the desired result.
+------+-------+---+------+--------+ | TIME | Value | - | TIME | Result | +------+-------+---+------+--------+ | 10 | NA | - | 20 | 2 | | 20 | 0 | - | 60 | 3 | | 30 | 1 | - | | | | 40 | NA | - | | | | 50 | NA | - | | | | 60 | 30 | - | | | | 70 | 68 | - | | | | 80 | 0 | - | | | | 90 | NA | - | | | +------+-------+---+------+--------+
Any comments are welcome. In case additional input is needed please leave a message.
Upvotes: 5
Views: 900
Reputation: 70316
This is perhaps not the most straightforward way to do it, but it does produce the desired result and since I wrote it, I thought I might post it, too (using the sample data by @konvas):
require(dplyr)
dat %>%
group_by(m = cumsum(is.na(value))) %>%
summarise(n = n() -1, time = first(time[!is.na(value)])) %>%
ungroup() %>%
filter(n > 0 & m > 0) %>%
select(-m)
#Source: local data frame [2 x 2]
#
# n time
#1 2 20
#2 3 60
Edit: I made a small correction in response to Ananda's comment, hope it works better now. For example, if the data was:
dat <- data.frame(time = seq(10, 90, 10), value = c(0, 2, 1, NA, NA, 30, 68, 0, NA))
dat
# time value
#1 10 0
#2 20 2
#3 30 1
#4 40 NA
#5 50 NA
#6 60 30
#7 70 68
#8 80 0
#9 90 NA
The code would result in:
dat %>%
group_by(m = cumsum(is.na(value))) %>%
summarise(n = n() -1, time = first(time[!is.na(value)])) %>%
ungroup() %>%
filter(n > 0 & m > 0) %>%
select(-m)
#Source: local data frame [1 x 2]
#
# n time
#1 3 60
Upvotes: 3
Reputation: 14366
Apart from rle
, you can also use a combination of diff
, which
and is.na
:
dat <- data.frame(time = seq(10, 90, 10), value = c(NA, 0, 1, NA, NA, 30, 68, 0, NA))
res <- data.frame(result = diff(which(is.na(dat$value))) - 1)
res$time <- dat$time[which(is.na(dat$value)) + 1][1:nrow(res)]
res[res$result != 0, ]
# time result
# 20 2
# 60 3
Upvotes: 6
Reputation: 193657
My "SOfun" package has a function called TrueSeq
that is like a group-maker with logical vectors. You can use that function in conjunction with "data.table" to get the result you want, like this:
library(SOfun)
library(data.table)
na.omit(data.table(TIME = df$TIME, Val = TrueSeq(
!is.na(df$value), zero2NA = TRUE)))[, list(TIME = TIME[1], .N), by = Val]
# Val TIME N
# 1: 1 20 2
# 2: 2 60 3
If you have "devtools" installed, you can install "SOfun" with:
library(devtools)
install_github("mrdwab/SOfun")
For reference, I've posted this Gist to be able to compare the results from the different approaches at this answer.
To summarize:
NA
:
NA
:
NA
value (thus, the first row of the input) as the first row of the results.NA
value as the first row of the results.Upvotes: 5
Reputation: 9618
In order to make my answer complete here a modified version:
d <- data.frame( TIME = seq(10, 90, by = 10), Value = c(NA, 0, 1, NA, NA, 30, 68, 0, NA))
aux <- rle(as.numeric((!is.na(d[,2]))))
cbind(TIME = d[cumsum(aux$lengths)[which(aux$values == 1)] - aux$lengths[aux$values == 1] +1, 1],
Result = rle(is.na(d$Value))$lengths[!rle(is.na(d$Value))$values])
TIME Result
[1,] 2 20
[2,] 3 60
Upvotes: 6