Reputation: 167
I want to identify (and subsequently delete) character vectors from a dataset, that consist of entries, that are all equal (e.g. do not have any variation):
test_data <- tibble(a = c("A", "B", "C"), b = c("A", "A", "A"), c = c("", "", ""), d = 1:3)
test_data
# A tibble: 3 x 4
a b c d
<chr> <chr> <chr> <dbl>
1 A A "" 1
2 B A "" 2
3 C A "" 3
I want the result to be something like this:
# A tibble: 3 x 2
a d
<chr> <dbl>
1 A 1
2 B 2
3 C 3
Of course I can achieve that by doing:
out <- c("b", "c")
test_data %>% select(- one_of((out)))
But as I have a lot of those columns and also a lot of rows, I'd prefer not to have to do it "manualy".
I found this but it only works for numeric vectors.
Upvotes: 3
Views: 85
Reputation: 887048
An option with Filter
and length
of unique
elements in base R
Filter(function(x) length(unique(x)) > 1, test_data)
# A tibble: 3 x 2
# a d
# <chr> <int>
#1 A 1
#2 B 2
#3 C 3
Or with dplyr
library(dplyr)
test_data %>%
select(where(~ length(unique(.)) > 1))
Upvotes: 1
Reputation: 35554
Base R solution
# (1)
test_data[sapply(test_data, function(x) length(unique(x)) > 1)]
# (2)
Filter(function(x) length(unique(x)) > 1, test_data)
dplyr 1.0.0 solution
test_data %>%
select(where(~ n_distinct(.x) > 1))
Output
# # A tibble: 3 x 2
# a d
# <chr> <int>
# 1 A 1
# 2 B 2
# 3 C 3
Upvotes: 2
Reputation: 5138
A little late but you could also use base::Filter()
to identify columns that contain only duplicates:
Filter(function(x) !all(duplicated(x)[-1L]), test_data)
# A tibble: 3 x 1
a
<chr>
1 A
2 B
3 C
Upvotes: 1
Reputation: 39858
You can do:
test_data %>%
select_if(~ !all(. == first(.)))
a
<chr>
1 A
2 B
3 C
Or:
test_data %>%
select_if(~ n_distinct(.) > 1)
Upvotes: 3
Reputation: 79208
you could also use keep
:
test_data%>%
keep(~length(unique(.))>1)
# A tibble: 3 x 2
a d
<chr> <int>
1 A 1
2 B 2
3 C 3
Upvotes: 1