P.Hubert
P.Hubert

Reputation: 147

R-Count n for each variable ignoring NA

I have the following data structure:

id  val1 val2 val3
1   1     5    2 
1   2     7    5
2   2     2    NA
2   NA    4    NA
3   4     8    4
3   3     NA   4

I want to count the number of oberservations for each variable, excluding the NA values. Result should be like:

id  val1 val2 val3
1   2     2    2 
2   1     2    NA
3   2     1    2

I tried the following lines but it would not work:

df %>% 
  group_by(id) %>% 
  summarise_all(funs(n(., na.rm = TRUE)))

Hoping for your suggestions and advice.

Upvotes: 3

Views: 250

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 101337

Another base R option using ´aggregate`

> aggregate(!is.na(df[-1]), df[1], sum)
  id val1 val2 val3
1  1    2    2    2
2  2    1    2    0
3  3    2    1    2

Data

> dput(df)
structure(list(id = c(1L, 1L, 2L, 2L, 3L, 3L), val1 = c(1L, 2L, 
2L, NA, 4L, 3L), val2 = c(5L, 7L, 2L, 4L, 8L, NA), val3 = c(2L,
5L, NA, NA, 4L, 4L)), class = "data.frame", row.names = c(NA,
-6L))

Upvotes: 1

akrun
akrun

Reputation: 887088

We can use data.table approach

library(data.table)
setDT(df)[, lapply(.SD, function(x) sum(!is.na(x))), id]

The summarise_all should be

library(dplyr)
df %>% 
   group_by(id) %>% 
   summarise_all(~  sum(!is.na(.), na.rm = TRUE))

Or using collapse

library(collapse)
fsum(!is.na(slt(df,-id)), g = GRP(df, ~ id))
#  val1 val2 val3
#1    2    2    2
#2    1    2    0
#3    2    1    2

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

You can use sum(!is.na(.)) to count non-NA values.

library(dplyr)
df %>% group_by(id) %>% summarise(across(.fns = ~sum(!is.na(.))))

#     id  val1  val2  val3
#  <int> <int> <int> <int>
#1     1     2     2     2
#2     2     1     2     0
#3     3     2     1     2

In base R :

aggregate(.~id, df, function(x) sum(!is.na(x)), na.action = 'na.keep')

Upvotes: 4

Related Questions