Reputation: 351
I have a dataset where a bunch of character columns only have one value, the name of the column itself. Each row is an observation, and I want to count how many such columns exist for each row.
For example:
id multi_value_col single_value_col_1 single_value_col_2
1 A single_value_col_1
2 D2 single_value_col_1 single_value_col_2
3 Z6 single_value_col_2
What I'd like is add a column that counts how many of those single value columns there are per row. Like so:
id multi_value_col single_value_col_1 single_value_col_2 count
1 A single_value_col_1 1
2 D2 single_value_col_1 single_value_col_2 2
3 Z6 single_value_col_2 1
My initial idea was to use mutate_if
and n_distinct
, replacing the string by TRUE
, which could then be used in a mutate
with rowSums:
data %>%
mutate_if(~n_distinct(.) == 1, TRUE, .) %>%
mutate(count = rowSums???)
However, I can't get the mutate_if
working, and I'm not sure about the rowSums command either—is there a sum_if TRUE operating available across rows?
Upvotes: 2
Views: 1307
Reputation: 887098
If we are checking with the column names, then
library(tidyverse)
data %>%
mutate(count = pmap_int(.[-1], ~ {x1 <- c(...)
sum(x1 == names(x1))} ))
# id multi_value_col single_value_col_1 single_value_col_2 count
#1 1 A single_value_col_1 1
#2 2 D2 single_value_col_1 single_value_col_2 2
#3 3 Z6 single_value_col_2 1
Or in base R
rowSums(data[-1] == names(data)[-1][col(data[-1])])
If the dataset have NA's, just change it to
rowSums(data[-1] == names(data)[-1][col(data[-1])], na.rm = TRUE)
Other options suggested by @thelatemail include to transpose the selected columns, and do a colSums
on the logical matrix
nms <- names(data)[nm1]
colSums(t(data[nms]) == nms)
Or with Reduce
Reduce(`+`, Map(`==`, data[nms], nms))
If the intention is to count based on the non blanks, in base R
, we create a logical matrix with the columns of interest and do the rowSums
on it
nm1 <- grep("single_value", names(data))
data$count <- rowSums(data[nm1] != "")
With dplyr
library(dplyr)
data %>%
mutate(count = rowSums(.[nm1] != ""))
# id multi_value_col single_value_col_1 single_value_col_2 count
#1 1 A single_value_col_1 1
#2 2 D2 single_value_col_1 single_value_col_2 2
#3 3 Z6 single_value_col_2 1
data <- structure(list(id = 1:3, multi_value_col = c("A", "D2", "Z6"),
single_value_col_1 = c("single_value_col_1", "single_value_col_1",
""), single_value_col_2 = c("", "single_value_col_2", "single_value_col_2"
)), row.names = c(NA, -3L), class = "data.frame")
Upvotes: 3
Reputation: 388982
Depending on if you have NA
in your data or blank cells you could use one of the following base R approach where we first find out columns with only one unique value and then count the non-NA or non-blank cells per row in those respective columns.
If you have NA
's
cols <- which(sapply(df, function(x) length(unique(na.omit(x)))) == 1)
df$count <- rowSums(!is.na(df[cols]))
df
# id multi_value_col single_value_col_1 single_value_col_2 count
#1 1 A single_value_col_1 <NA> 1
#2 2 D2 single_value_col_1 single_value_col_2 2
#3 3 Z6 <NA> single_value_col_2 1
If you have empty cells
cols <- which(sapply(df, function(x) length(unique(x[x!=""]))) == 1)
df$count <- rowSums(df[cols] != "")
Upvotes: 1