Khashir
Khashir

Reputation: 351

How to count the number of columns with a single value?

I have a dataset where a bunch of character columns only have one value, the name of the column itself. Each row is an observation, and I want to count how many such columns exist for each row.

For example:

id multi_value_col single_value_col_1 single_value_col_2
1  A               single_value_col_1 
2  D2              single_value_col_1 single_value_col_2
3  Z6                                 single_value_col_2

What I'd like is add a column that counts how many of those single value columns there are per row. Like so:

id multi_value_col single_value_col_1 single_value_col_2  count
1  A               single_value_col_1                     1 
2  D2              single_value_col_1 single_value_col_2  2
3  Z6                                 single_value_col_2  1

My initial idea was to use mutate_if and n_distinct, replacing the string by TRUE, which could then be used in a mutate with rowSums:

data %>% 
 mutate_if(~n_distinct(.) == 1, TRUE, .) %>%
 mutate(count = rowSums???)

However, I can't get the mutate_if working, and I'm not sure about the rowSums command either—is there a sum_if TRUE operating available across rows?

Upvotes: 2

Views: 1307

Answers (2)

akrun
akrun

Reputation: 887098

If we are checking with the column names, then

library(tidyverse)
data %>%
    mutate(count = pmap_int(.[-1], ~ {x1 <- c(...)
                 sum(x1 == names(x1))} ))
#  id multi_value_col single_value_col_1 single_value_col_2 count
#1  1               A single_value_col_1                        1
#2  2              D2 single_value_col_1 single_value_col_2     2
#3  3              Z6                    single_value_col_2     1

Or in base R

rowSums(data[-1] == names(data)[-1][col(data[-1])])

If the dataset have NA's, just change it to

rowSums(data[-1] == names(data)[-1][col(data[-1])], na.rm = TRUE)

Other options suggested by @thelatemail include to transpose the selected columns, and do a colSums on the logical matrix

nms <- names(data)[nm1]
colSums(t(data[nms]) == nms)

Or with Reduce

Reduce(`+`, Map(`==`, data[nms], nms))

If the intention is to count based on the non blanks, in base R, we create a logical matrix with the columns of interest and do the rowSums on it

nm1 <- grep("single_value", names(data))
data$count <-  rowSums(data[nm1] != "")

With dplyr

library(dplyr)
data %>% 
    mutate(count = rowSums(.[nm1] != ""))
#  id multi_value_col single_value_col_1 single_value_col_2 count
#1  1               A single_value_col_1                        1
#2  2              D2 single_value_col_1 single_value_col_2     2
#3  3              Z6                    single_value_col_2     1

data

data <- structure(list(id = 1:3, multi_value_col = c("A", "D2", "Z6"), 
    single_value_col_1 = c("single_value_col_1", "single_value_col_1", 
    ""), single_value_col_2 = c("", "single_value_col_2", "single_value_col_2"
    )), row.names = c(NA, -3L), class = "data.frame")

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 388982

Depending on if you have NA in your data or blank cells you could use one of the following base R approach where we first find out columns with only one unique value and then count the non-NA or non-blank cells per row in those respective columns.

If you have NA's

cols <- which(sapply(df, function(x) length(unique(na.omit(x)))) == 1)
df$count <- rowSums(!is.na(df[cols]))

df
#  id multi_value_col single_value_col_1 single_value_col_2 count
#1  1               A single_value_col_1               <NA>     1
#2  2              D2 single_value_col_1 single_value_col_2     2
#3  3              Z6               <NA> single_value_col_2     1

If you have empty cells

cols <- which(sapply(df, function(x) length(unique(x[x!=""]))) == 1)
df$count <- rowSums(df[cols] != "")

Upvotes: 1

Related Questions