Reputation: 4755
I have the following code and am unsure how this would be written using dplyr
df <- data.frame(
a = c(1, 1, 1, 2, 2, 2, 2, 2),
b = c(1, 2, 3, 2, 3, 2, 3, 2),
c = c(1, 2, 3, 4, 3, 4, 3, 4),
d = c(1, 2, 3, 4, 5, 4, 5, 4),
e = c(1, 2, 3, 2, 3, 4, 3, 5)
)
n = 100
results=data.frame(levels=double(),amount=double())
for(i in 1:n){
r <- df %>% select_if(~n_distinct(.)==i)
if(dim(r)[2]>0){
results=rbind(results,data.frame(levels=i,amount=dim(r)[2]))
}
}
results
which outputs
levels amount
1 2 1
2 3 1
3 4 1
4 5 2
The use of the for
loop and if
statement makes me think there must be a
nicer approach though, or at least, one that makes use of dplyr instead.
Data frame with different types
df <- data.frame(
a = c(1, 1, 1, 2, 2, 2, 2, 2),
b = c(1, 2, 3, 2, 3, 2, 3, 2),
c = c(1, 2, 3, 4, 3, 4, 3, 4),
d = c(1, 2, 3, 4, 5, 4, 5, 4),
e = c(1, 2, 3, 2, 3, 4, 3, 5),
f = c('a','b','a','a','a','a','a','b')
)
Upvotes: 1
Views: 91
Reputation: 1996
I think this is a better way to do what you want. Using dplyr
and purr
.
library(tidyverse)
df <- data.frame(
a = c(1, 1, 1, 2, 2, 2, 2, 2),
b = c(1, 2, 3, 2, 3, 2, 3, 2),
c = c(1, 2, 3, 4, 3, 4, 3, 4),
d = c(1, 2, 3, 4, 5, 4, 5, 4),
e = c(1, 2, 3, 2, 3, 4, 3, 5)
)
map_df(df, function(d){
data.frame(level = n_distinct(d))
}) %>%
group_by(level) %>%
summarise(amount = n())
Upvotes: 1
Reputation: 32558
library(dplyr)
library(tidyr)
df %>%
summarise_all(.funs = function(x) length(unique(x))) %>%
pivot_longer(everything()) %>% #OR gather %>%
count(value)
Upvotes: 3
Reputation: 389275
A base R approach could be :
stack(table(sapply(df, function(x) length(unique(x)))))
# ind values
#1 2 1
#2 3 1
#3 4 1
#4 5 2
Upvotes: 2
Reputation: 40171
One dplyr
and tidyr
possibility could be:
df %>%
pivot_longer(everything()) %>%
group_by(name) %>%
summarise(n_levels = n_distinct(value)) %>%
ungroup() %>%
count(n_levels)
n_levels n
<int> <int>
1 2 1
2 3 1
3 4 1
4 5 2
Upvotes: 3