Reputation: 1017
I have the following dataframe,
a1=c(9,8,rep(NA,5))
a2=c(3,NA,3,NA,3,NA,4)
a3=c(11,6,7,NA,5,NA,NA)
k<-as.data.frame(rbind(a1,a2,a3))
I would like to add a column that indicates the number of columns from the first until the last non NA value. That is for the first row the value in this additional column would be 2, for the second row it would be 7, and for the last it would be 5.
Upvotes: 2
Views: 36
Reputation: 887651
We could loop over the rows (apply
and MARGIN = 1
), get the index of non-NA elements (which
+ !is.na
), extract the min/max
(range
), take the diff
erence` and add 1
k$new <- apply(k, 1, function(x) {
i1 <- which(!is.na(x))
i2 <- diff(range(i1))
i2 + 1 })
Or using max.col
as a vectorized approach. Convert the data into logical matrix, apply max.col
with ties.method
as first
and last
to get the position of first or last max value (TRUE
-> 1 and FALSE
-> 0) in each row. As it is a logical matrix, this is basically looking for the first and last TRUE positions in each row, subtract and add 1
max.col(!is.na(k), "last") - max.col(!is.na(k), "first") + 1
[1] 2 7 5
Upvotes: 4
Reputation: 21938
This could also be done in tidyverse
:
library(dplyr)
library(purrr)
k %>%
mutate(new = pmap(k, ~ {x <- which(!is.na(c(...)))
y <- max(x) - min(x) + 1
y}))
V1 V2 V3 V4 V5 V6 V7 new
a1 9 8 NA NA NA NA NA 2
a2 3 NA 3 NA 3 NA 4 7
a3 11 6 7 NA 5 NA NA 5
And also this one:
k %>%
rowwise() %>%
mutate(new = {x <- which(!is.na(c_across(everything())))
range(x)[2] - range(x)[1] + 1})
# A tibble: 3 x 8
# Rowwise:
V1 V2 V3 V4 V5 V6 V7 new
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 9 8 NA NA NA NA NA 2
2 3 NA 3 NA 3 NA 4 7
3 11 6 7 NA 5 NA NA 5
Upvotes: 2