MBeck
MBeck

Reputation: 167

Delete character vectors from df that contain the same string

I want to identify (and subsequently delete) character vectors from a dataset, that consist of entries, that are all equal (e.g. do not have any variation):

test_data <- tibble(a = c("A", "B", "C"), b = c("A", "A", "A"), c = c("", "", ""), d = 1:3)

test_data

# A tibble: 3 x 4
  a     b     c         d
  <chr> <chr> <chr> <dbl>
1 A     A     ""        1
2 B     A     ""        2
3 C     A     ""        3

I want the result to be something like this:

# A tibble: 3 x 2
  a         d
  <chr> <dbl>
1 A         1
2 B         2
3 C         3

Of course I can achieve that by doing:

out <- c("b", "c")
test_data %>% select(- one_of((out)))

But as I have a lot of those columns and also a lot of rows, I'd prefer not to have to do it "manualy".

I found this but it only works for numeric vectors.

Upvotes: 3

Views: 85

Answers (5)

akrun
akrun

Reputation: 887048

An option with Filter and length of unique elements in base R

Filter(function(x) length(unique(x)) > 1, test_data)
# A tibble: 3 x 2
#  a         d
#  <chr> <int>
#1 A         1
#2 B         2
#3 C         3

Or with dplyr

library(dplyr)
test_data %>% 
      select(where(~ length(unique(.)) > 1))

Upvotes: 1

Darren Tsai
Darren Tsai

Reputation: 35554

Base R solution

# (1)
test_data[sapply(test_data, function(x) length(unique(x)) > 1)]
# (2)
Filter(function(x) length(unique(x)) > 1, test_data)

dplyr 1.0.0 solution

test_data %>%
  select(where(~ n_distinct(.x) > 1))

Output

# # A tibble: 3 x 2
#   a         d
#   <chr> <int>
# 1 A         1
# 2 B         2
# 3 C         3

Upvotes: 2

Andrew
Andrew

Reputation: 5138

A little late but you could also use base::Filter() to identify columns that contain only duplicates:

Filter(function(x) !all(duplicated(x)[-1L]), test_data)

# A tibble: 3 x 1
  a    
  <chr>
1 A    
2 B    
3 C 

Upvotes: 1

tmfmnk
tmfmnk

Reputation: 39858

You can do:

test_data %>%
 select_if(~ !all(. == first(.)))

  a    
  <chr>
1 A    
2 B    
3 C 

Or:

test_data %>%
 select_if(~ n_distinct(.) > 1)

Upvotes: 3

Onyambu
Onyambu

Reputation: 79208

you could also use keep:

test_data%>%
  keep(~length(unique(.))>1)
# A tibble: 3 x 2
  a         d
  <chr> <int>
1 A         1
2 B         2
3 C         3

Upvotes: 1

Related Questions