NightDog
NightDog

Reputation: 91

Add a new column to count distinct of multiple columns (row wise) in R

I have a dataframe, NA is the missing value, not string

df <- data.frame(A = c(142, 1, 4),
             B = c("NA",1,5),
             c = c("NA","NA","NA"),
             stringsAsFactors = FALSE) 

I want to add a new column D to show count distinct value of both A, B and C but we don't count NA in. The Desire output is the following:

df <- data.frame(A = c(142, 1, 4),
             B = c("NA",1,5),
             c = c("NA","NA","NA"),
             D = c(1, 1, 2),
             stringsAsFactors = FALSE) 

Upvotes: 1

Views: 174

Answers (2)

akrun
akrun

Reputation: 886938

We can just use base R

df$D <- apply(df, 1, function(x) length(unique(na.omit(x))))
df$D
#[1] 1 2 2

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388807

Assuming your NA's are real NA's and not string "NA"s using dplyr (>= 1.0.0) you can do :

library(dplyr)

df %>%
  rowwise() %>%
  mutate(D = n_distinct(na.omit(c_across())))

#     A     B c         D
#  <dbl> <dbl> <lgl> <int>
#1   142    NA NA        1
#2     1     1 NA        1
#3     4     5 NA        2

data

df <- type.convert(df)

Upvotes: 1

Related Questions