Geet
Geet

Reputation: 2575

Count NAs between first and last occured numbers

Here is my toy dataset

df <- tribble(
  ~x, ~y, ~z,
  7,   NA, 4,
  8,   2,  NA,
  NA,  NA, NA,
  NA,  4,  6)

I want to get a dataframe with a number of NAs for each variable only between the first and the last occurrence of numbers in each column and number of NAs between the first occurred number and last row. So, for this example, the desired solution is

desired_df <- tribble(~vars, ~na_count_between_1st_last_num, ~na_count_between_1st_num_last_row,
                       "x",     0,                              2,
                       "y",     1,                              1,
                       "z",     2,                              2)

How can I get the desired output?

Upvotes: 6

Views: 169

Answers (4)

chinsoon12
chinsoon12

Reputation: 25225

Here is another option using data.table::nafill:

library(data.table)
natrail <- colSums(is.na(as.data.table(nafill(df, "nocb"))))
nastart <- colSums(is.na(as.data.table(nafill(df, "locf"))))    
n1last <- nrow(df) - colSums(!is.na(df)) - nastart
n1num <- n1last - natrail

cbind(na_count_between_1st_last_num=n1num, na_count_between_1st_num_last_row=n1last)

output:

  na_count_between_1st_last_num na_count_between_1st_num_last_row
x                             0                                 2
y                             1                                 1
z                             2                                 2
a                             1                                 2
b                             0                                 0
d                             0                                 1

data:

df <- data.frame(x=c(7,8,NA,NA), #0 2
    y=c(NA, 2, NA, 4),           #1 1
    z=c(4, NA, NA, 6),           #2 2
    a=c(1, NA, 1, NA),           #1 2
    b=c(NA, NA, 1, 1),           #0 0
    d=c(NA, 1, 1, NA))           #0 1

Upvotes: 1

G. Grothendieck
G. Grothendieck

Reputation: 269704

na.trim trims NAs off both ends or just the left or right end if we specify sides="left" or sides="right" so:

library(dplyr)
library(tibble)
library(tidyr)
library(zoo)

df %>%
  pivot_longer(everything()) %>%
  group_by(name) %>%
  summarize(na1 = sum(is.na(na.trim(value))), 
            na2 = sum(is.na(na.trim(value, "left")))) %>%
  ungroup

giving:

# A tibble: 3 x 3
  name    na1   na2
  <chr> <int> <int>
1 x         0     2
2 y         1     1
3 z         2     2

Upvotes: 4

Sotos
Sotos

Reputation: 51592

Here is an idea via base R,

f1 <- function(x) {i1 <- which(!is.na(x)); head(i1, 1):tail(i1, 1) }
f2 <- function(x) {i1 <- which(!is.na(x)); head(i1, 1):length(x) }

merge(stack(sapply(df, function(i) sum(is.na(i[f1(i)])))), 
      stack(sapply(df, function(i) sum(is.na(i[f2(i)])))), by = 'ind')

#  ind values.x values.y
#1   x        0        2
#2   y        1        1
#3   z        2        2

Upvotes: 7

Cettt
Cettt

Reputation: 11981

Here is one possibility using two functions:

fun1 <- function(x) { #count NA between first and last non NA
  idx1 <- cumsum(!is.na(x)) > 0 #identify leading NA
  idx2 <- rev(cumsum(!is.na(rev(x))) > 0) #identify trailing NA
  sum(is.na(x[idx1 & idx2]))
}


fun2 <- function(x) {#count NA between first non-NA and last element
  idx1 <- cumsum(!is.na(x)) > 0 #identify leading NA
  sum(is.na(x[idx1]))
}

Afterwards you just summarise your data.frame and reshape it:

df %>% summarise_all(list(m1 = ~fun1(.), m2 = ~fun2(.))) %>%
  pivot_longer(cols = everything(), names_pattern = "^(.)_(.*)$", names_to = c("vars", "a"),
               values_to = "x") %>%
  spread(a, x)

# A tibble: 3 x 3
  vars     m1    m2
  <chr> <int> <int>
1 x         0     2
2 y         1     1
3 z         2     2

Upvotes: 3

Related Questions