Sharif
Sharif

Reputation: 391

counting the number of observations row wise using dplyr

I have a dataset look like this -

sample <- tibble(x = c (1,2,3,NA), y = c (5, NA,2, NA))

sample
# A tibble: 4 x 2
      x     y
  <dbl> <dbl>
1     1     5
2     2    NA
3     3     2
4    NA    NA

Now I want create a new variable Z, which will count how many observations are in each row. For example for the sample dataset above the first value of new variable Z should be 2 because both x and y have values. Similarly, for 2nd row the value of Z is 1 as there is one missing value and for 4th row, the value is 0 as there is no observations in the row.

The expected dataset looks like this -

      x     y   z
  <dbl> <dbl> <dbl>
1     1     5  2
2     2    NA  1
3     3     2  2
4    NA    NA  0

I want to do this on few number of variables, not the whole dataset.

Upvotes: 0

Views: 724

Answers (3)

SteveM
SteveM

Reputation: 2301

apply function with selected columns example:

set.seed(7)
vals <- sample(c(1:20, NA, NA), 20)
sample <- matrix(vals, ncol = 5)

# Select columns 1, 3, 4

cols <- c(1, 3, 4)
rowcnts <- apply(sample[ , cols], 1, function(x) length(x[!is.na(x)]))
sample <- cbind(sample, rowcnts)

> sample
                    rowcnts
[1,] 10 15 16 NA 12       2
[2,] 19  8 14 18  9       3
[3,]  7 17  6  4  1       3
[4,]  2  3 13 NA  5       2

Upvotes: 0

manotheshark
manotheshark

Reputation: 4357

Using base R. First line checks all columns, second one checks columns by name, third might not work as good if the number of columns is substantial.

sample$z1 <- rowSums(!is.na(sample))
sample$z2 <- rowSums(!is.na(sample[c("x", "y")]))
sample$z3 <- is.finite(sample$x) + is.finite(sample$y)

> sample
# A tibble: 4 x 5
      x     y    z1    z2    z3
  <dbl> <dbl> <dbl> <dbl> <int>
1     1     5     2     2     2
2     2    NA     1     1     1
3     3     2     2     2     2
4    NA    NA     0     0     0

Upvotes: 1

akrun
akrun

Reputation: 887058

We can use

library(dplyr)
sample %>%
     rowwise %>%
     mutate(z = sum(!is.na(cur_data()))) %>%
    ungroup

-output

# A tibble: 4 x 3
#      x     y     z
#  <dbl> <dbl> <int>
#1     1     5     2
#2     2    NA     1
#3     3     2     2
#4    NA    NA     0

If it is select columns

sample %>%
    rowwise %>%
    mutate(z = sum(!is.na(select(cur_data(), x:y))))

Or with rowSums on a logical matrix

sample %>%
   mutate(z = rowSums(!is.na(cur_data())))

-output

# A tibble: 4 x 3
#      x     y     z
#  <dbl> <dbl> <dbl>
#1     1     5     2
#2     2    NA     1
#3     3     2     2
#4    NA    NA     0

Upvotes: 1

Related Questions