Reputation: 2716
my goal is to look for how many unique values each column in my data frame has, here is what I came up with
### df is a data frame, 32 named columns, millions of rows
test1 <- sapply(df, function(x) length(unique(x)))
### I get a named integer from the above command
test2 <- data.frame(names(test1), test1)
### now I get a data frame, with row names
row.names(test2) <- NULL
### to get rid of the row names
test3 <- test2[order(test1),]
### finally I get a what I want
my question would be, how do I do this in a smaller number of steps???
Upvotes: 1
Views: 262
Reputation: 57
Is this doing what you mean?
test1 <- sort(sapply(df, function(x) length(unique(x))), decreasing = T)
data.frame(names(test1), test1, row.names = NULL)
Upvotes: 1
Reputation: 4299
I am not sure if this is what you want.
Please provide a sample of your dataset (with dput
)
Imagine you want to count the number of unique values for the data mtcars
.
library(tidyr)
library(dplyr)
mtcars %>%
gather() %>%
group_by(key) %>%
summarise( ndist = n_distinct(value) ) %>%
arrange(desc(ndist))
This will give you
key ndist
1 qsec 30
2 wt 29
3 disp 27
4 mpg 25
5 hp 22
6 drat 22
7 carb 6
8 cyl 3
9 gear 3
10 vs 2
11 am 2
Upvotes: 3
Reputation: 37879
One call in base R:
#using the same column names as in your example
test1 <- data.frame(names.test1 = colnames(mtcars),
test1=sapply(mtcars, function(x) length(unique(x))),
row.names=NULL)
Output:
> test1
names.test1 test1
1 mpg 25
2 cyl 3
3 disp 27
4 hp 22
5 drat 22
6 wt 29
7 qsec 30
8 vs 2
9 am 2
10 gear 3
11 carb 6
This would then require manual ordering though as @BenBolker mentions in the comment:
test1 <- test1[order(test1$test1),])
However, you could do an ordered one-liner with data.table
:
library(data.table)
test1 <- data.table(names.test1 = colnames(mtcars),
test1=sapply(mtcars, function(x) length(unique(x))),
key='test1')
> test1
names.test1 test1
1: vs 2
2: am 2
3: cyl 3
4: gear 3
5: carb 6
6: hp 22
7: drat 22
8: mpg 25
9: disp 27
10: wt 29
11: qsec 30
Upvotes: 4