Reputation: 816
I have a series of dates and I want to count each record the sequence of dates, while skipping missing values.
Essentially, I want to see the following result, where a
are my dates and b
is my index of the date record. You can see that row 5 is my 4th record, and visit 7 is my 5th record.
tibble(a = c(12, 24, 32, NA, 55, NA, 73), b = c(1, 2, 3, NA, 4, NA, 5))
a b
<dbl> <dbl>
1 12 1
2 24 2
3 32 3
4 NA NA
5 55 4
6 NA NA
7 73 5
It seems that group_by() %>% mutate(sq = sequence(n()))
doesn't work in this case, because I don't know how to filter out the missing values while counting. I need to keep those missing values because my data is pretty large.
Is a separate operation of filtering the data, getting the sequence, and using left_join
my best option?
Upvotes: 1
Views: 321
Reputation: 269644
Cumulatively sum an indicator of non-NA and then add 0*a
to effectively NA out any component that was originally NA while adding 0 to the rest (so not changing them).
a <- c(12, 24, 32, NA, 55, NA, 73)
cumsum(!is.na(a)) + 0 * a
## [1] 1 2 3 NA 4 NA 5
Upvotes: 3
Reputation: 887118
We could specify the i
as non-NA logical vector, and create the 'b' by assigning the sequence of rows
library(data.table)
setDT(dat)[!is.na(a), b := seq_len(.N)]
-output
dat
# a b
#1: 12 1
#2: 24 2
#3: 32 3
#4: NA NA
#5: 55 4
#6: NA NA
#7: 73 5
Upvotes: 1
Reputation: 101373
Maybe you can try replace
+ seq_along
like below
within(
df,
b <- replace(a, !is.na(a), seq_along(na.omit(a)))
)
Upvotes: 1
Reputation: 28850
library(dplyr)
dat <- tibble(a = c(12, 24, 32, NA, 55, NA, 73))
dat %>%
mutate(sq = ifelse(is.na(a), NA, cumsum(!is.na(a))))
#> # A tibble: 7 x 2
#> a sq
#> <dbl> <int>
#> 1 12 1
#> 2 24 2
#> 3 32 3
#> 4 NA NA
#> 5 55 4
#> 6 NA NA
#> 7 73 5
Upvotes: 3