Reputation: 2081
I'm trying to do a simple and straightforward output to see (1) the number of distinct values per variable, and (2) show which distinct values are those for the variables with less than X distinct values. When you run this:
sapply(mtcars, function(z) NROW(unique(z)))
It gives a very simple, straightforward information per variable:
mpg 25
cyl 3
disp 27
hp 22
drat 22
wt 29
qsec 30
vs 2
am 2
gear 3
carb 6
However, I still need to show the distinct values. Let's say we will show the distinct values for the variables with less than 10 distinct values. I have achieved this so far:
sapply(mtcars, function(z) if(NROW(unique(z)) < 10) {paste0(NROW(unique(z)), " ; ", unique(z))} else {NROW(unique(z))})
And it shows a messy summary. I'm looking for something like this:
mpg 25
cyl 3 ; 6 4 8
disp 27
hp 22
drat 22
wt 29
qsec 30
vs 2 ; 0 1
am 2 ; 1 0
gear 3 ; 4 3 5
carb 6 ; 4 1 2 3 6 8
Upvotes: 1
Views: 780
Reputation: 23574
Here is an option that I came up with. First I created a data frame containing the number of unique values in each variable, which is tmp1
. Then, I created a character vector containing unique values in each variable. Since you specified that you want to print out unique values if there are less than 10 unique values, I handled that in the if_else()
part. Then, I bound tmp1
and tmp2
, and changed the order of columns as well as a variable name.
library(dplyr)
summarize_all(mtcars,
.funs = list(~n_distinct(.))) %>%
stack -> tmp1
summarize_all(mtcars,
.funs = list(~if_else(n_distinct(.) < 10,
toString(unique(.)),
"More than 10 unique values"))) %>%
unlist -> tmp2
bind_cols(tmp1, distinct_value = tmp2) %>%
select(variable = ind, everything())
# variable values distinct_value
#1 mpg 25 More than 10 unique values
#2 cyl 3 6, 4, 8
#3 disp 27 More than 10 unique values
#4 hp 22 More than 10 unique values
#5 drat 22 More than 10 unique values
#6 wt 29 More than 10 unique values
#7 qsec 30 More than 10 unique values
#8 vs 2 0, 1
#9 am 2 1, 0
#10 gear 3 4, 3, 5
#11 carb 6 4, 1, 2, 3, 6, 8
Upvotes: 2
Reputation: 2081
I took @H1 answer as it's exactly the output expected (a simple one):
sapply(mtcars, function(z) if(length(unique(z)) < 10) {paste0(length(unique(z)), "; ", toString(unique(z)))} else {length(unique(z))})
Upvotes: 0
Reputation: 389355
You can try something like this and create nested list conditional on length
of unique
values in each column.
sapply(mtcars, function(x) {
uniq <- unique(x)
if (length(uniq) < 10)
list(no_uniq_values = length(uniq), uniq_values = uniq)
else
length(uniq)
})
#$mpg
#[1] 25
#$cyl
#$cyl$no_uniq_values
#[1] 3
#$cyl$uniq_values
#[1] 6 4 8
#$disp
#[1] 27
#$hp
#[1] 22
#$drat
#[1] 22
#.....
#.....
Upvotes: 1