Reputation: 4755

Find number of variables on different levels within a dataframe using dplyr?

I have the following code and am unsure how this would be written using dplyr

df <- data.frame(
  a = c(1, 1, 1, 2, 2, 2, 2, 2),
  b = c(1, 2, 3, 2, 3, 2, 3, 2),
  c = c(1, 2, 3, 4, 3, 4, 3, 4),
  d = c(1, 2, 3, 4, 5, 4, 5, 4),
  e = c(1, 2, 3, 2, 3, 4, 3, 5)
)

n = 100
results=data.frame(levels=double(),amount=double())
for(i in 1:n){
  r <- df %>% select_if(~n_distinct(.)==i)
  if(dim(r)[2]>0){
    results=rbind(results,data.frame(levels=i,amount=dim(r)[2]))
  }
}
results

which outputs

  levels amount
1      2      1
2      3      1
3      4      1
4      5      2

The use of the for loop and if statement makes me think there must be a nicer approach though, or at least, one that makes use of dplyr instead.

edit

Data frame with different types

df <- data.frame(
  a = c(1, 1, 1, 2, 2, 2, 2, 2),
  b = c(1, 2, 3, 2, 3, 2, 3, 2),
  c = c(1, 2, 3, 4, 3, 4, 3, 4),
  d = c(1, 2, 3, 4, 5, 4, 5, 4),
  e = c(1, 2, 3, 2, 3, 4, 3, 5),
  f = c('a','b','a','a','a','a','a','b')
)

Upvotes: 1

Answers (4)

fmassica

Reputation: 1996

I think this is a better way to do what you want. Using dplyr and purr.

library(tidyverse)

df <- data.frame(
  a = c(1, 1, 1, 2, 2, 2, 2, 2),
  b = c(1, 2, 3, 2, 3, 2, 3, 2),
  c = c(1, 2, 3, 4, 3, 4, 3, 4),
  d = c(1, 2, 3, 4, 5, 4, 5, 4),
  e = c(1, 2, 3, 2, 3, 4, 3, 5)
)

map_df(df, function(d){
              data.frame(level = n_distinct(d))
           }) %>% 
group_by(level) %>% 
summarise(amount = n())

Upvotes: 1

d.b

Reputation: 32558

library(dplyr)
library(tidyr)
df %>%
    summarise_all(.funs = function(x) length(unique(x))) %>%
    pivot_longer(everything()) %>%  #OR gather %>%
    count(value)

Upvotes: 3

Ronak Shah

Reputation: 389275

A base R approach could be :

stack(table(sapply(df, function(x) length(unique(x)))))

#  ind values
#1   2      1
#2   3      1
#3   4      1
#4   5      2

Upvotes: 2

tmfmnk

Reputation: 40171

One dplyr and tidyr possibility could be:

df %>%
 pivot_longer(everything()) %>%
 group_by(name) %>%
 summarise(n_levels = n_distinct(value)) %>%
 ungroup() %>%
 count(n_levels)

  n_levels     n
     <int> <int>
1        2     1
2        3     1
3        4     1
4        5     2

Upvotes: 3

Find number of variables on different levels within a dataframe using dplyr?

edit

Answers (4)

Related Questions