daniell
daniell

Reputation: 179

Remove columns that have only a unique value

I want to remove columns that have only a unique value.

First, I try it for a single column and it works:

data %/% 
  select_if(length(unique(data$policy_id)) > 1)

then I try it for multiple columns as below:

data %/% 
  select_if(length(unique(data[, c("policy_date", "policy_id"])) > 1)

but it does not work. I think it is a conceptual mistake due to my lack of experience.

thanks in advance

Upvotes: 7

Views: 1444

Answers (5)

akrun
akrun

Reputation: 886938

An option with base R

df[sapply(df, function(x) length(unique(x))) > 1]

data

df <- data.frame(A = LETTERS[1:5], B = 1:5, C = 2)

Upvotes: 0

AlexB
AlexB

Reputation: 3269

Another option would be to use purrr:

df %>% purrr::keep(~all(n_distinct(.) > 1))
df %>% purrr::keep(~all(length(unique(.)) > 1))

df %>% purrr::discard(~!all(n_distinct(.) > 1))
df %>% purrr::discard(~!all(length(unique(.)) > 1))

Mixing table with apply generates the same output.

df[, apply(df, 2, function(i) length(table(i)) > 1)]

df <- data.frame(A = LETTERS[1:5], B = 1:5, C = 2)

Upvotes: 0

ThomasIsCoding
ThomasIsCoding

Reputation: 101034

Some base R options:

  • Using lengths + unique + sapply
subset(df,select = lengths(sapply(df,unique))>1)
  • Using Filter + length + unique
Filter(function(x) length(unique(x))>1,df)

Upvotes: 3

Karthik S
Karthik S

Reputation: 11584

Does this work:

> df <- data.frame(col1 = 1:10,
+                  col2 = rep(10,10),
+                  col3 = round(rnorm(10,1)))
> df
   col1 col2 col3
1     1   10    1
2     2   10    0
3     3   10    1
4     4   10    1
5     5   10    1
6     6   10    0
7     7   10    2
8     8   10    1
9     9   10    1
10   10   10    1
> df %>% select_if(~length(unique(.)) > 1)
   col1 col3
1     1    1
2     2    0
3     3    1
4     4    1
5     5    1
6     6    0
7     7    2
8     8    1
9     9    1
10   10    1
> 

Upvotes: 1

Allan Cameron
Allan Cameron

Reputation: 173793

You can use select(where()).

Suppose I have a data frame like this:

df <- data.frame(A = LETTERS[1:5], B = 1:5, C = 2)

df
#>   A B C
#> 1 A 1 2
#> 2 B 2 2
#> 3 C 3 2
#> 4 D 4 2
#> 5 E 5 2

Then I can do:

df %>% select(where(~ n_distinct(.) > 1))

#>   A B
#> 1 A 1
#> 2 B 2
#> 3 C 3
#> 4 D 4
#> 5 E 5

Upvotes: 4

Related Questions