Vasile
Vasile

Reputation: 1017

Find the number of columns between the first and last non NA value

I have the following dataframe,

a1=c(9,8,rep(NA,5))
a2=c(3,NA,3,NA,3,NA,4)
a3=c(11,6,7,NA,5,NA,NA)
k<-as.data.frame(rbind(a1,a2,a3))

I would like to add a column that indicates the number of columns from the first until the last non NA value. That is for the first row the value in this additional column would be 2, for the second row it would be 7, and for the last it would be 5.

Upvotes: 2

Views: 36

Answers (2)

akrun
akrun

Reputation: 887651

We could loop over the rows (apply and MARGIN = 1), get the index of non-NA elements (which + !is.na), extract the min/max (range), take the difference` and add 1

k$new <- apply(k, 1, function(x) {
       i1 <- which(!is.na(x))
         i2 <- diff(range(i1))
        i2 + 1 })

Or using max.col as a vectorized approach. Convert the data into logical matrix, apply max.col with ties.method as first and last to get the position of first or last max value (TRUE -> 1 and FALSE -> 0) in each row. As it is a logical matrix, this is basically looking for the first and last TRUE positions in each row, subtract and add 1

max.col(!is.na(k), "last") - max.col(!is.na(k), "first") + 1
[1] 2 7 5

Upvotes: 4

Anoushiravan R
Anoushiravan R

Reputation: 21938

This could also be done in tidyverse:

library(dplyr)
library(purrr)

k %>%
  mutate(new = pmap(k, ~ {x <- which(!is.na(c(...)))
  y <- max(x) - min(x) + 1
  y}))

   V1 V2 V3 V4 V5 V6 V7 new
a1  9  8 NA NA NA NA NA   2
a2  3 NA  3 NA  3 NA  4   7
a3 11  6  7 NA  5 NA NA   5

And also this one:

k %>%
  rowwise() %>%
  mutate(new = {x <- which(!is.na(c_across(everything())))
    range(x)[2] - range(x)[1] + 1})

# A tibble: 3 x 8
# Rowwise: 
     V1    V2    V3    V4    V5    V6    V7   new
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     9     8    NA    NA    NA    NA    NA     2
2     3    NA     3    NA     3    NA     4     7
3    11     6     7    NA     5    NA    NA     5

Upvotes: 2

Related Questions