Reputation: 729
I want to get the number of unique values in each of the columns of a data frame. Let's say I have the following data frame:
DF <- data.frame(v1 = c(1,2,3,2), v2 = c("a","a","b","b"))
then it should return that there are 3 distinct values for v1, and 2 for v2.
I tried unique(DF), but it does not work as each rows are different.
Upvotes: 30
Views: 83661
Reputation: 41260
In dplyr (>=1.0.0 - june 2020)
:
DF %>% summarize_all(n_distinct)
v1 v2
1 3 2
Upvotes: 1
Reputation: 1
This should work for getting an unique value for each variable:
length(unique(datasetname$variablename))
Upvotes: 0
Reputation: 42592
For the sake of completeness: Since CRAN version 1.9.6 of 19 Sep 2015, the data.table
package includes the helper function uniqueN()
which saves us from writing
function(x) length(unique(x))
when calling one of the siblings of apply()
:
sapply(DF, data.table::uniqueN)
v1 v2 3 2
Note that neither the data.table
package needs to be loaded nor DF
coerced to class data.table
in order to use uniqueN()
, here.
Upvotes: 1
Reputation: 388
This will give you unique values in DF dataframe of column 1.
unique(sc_data[,1])
Upvotes: -3
Reputation: 342
I think a function like this would give you what you are looking for. This also shows the unique values, in addition to how many NA's there are in each dataframe's columns. Simply plug in your dataframe, and you are good to go.
totaluniquevals <- function(df) {
x <<- data.frame("Row Name"= numeric(0), "TotalUnique"=numeric(0), "IsNA"=numeric(0))
result <- sapply(df, function(x) length(unique(x)))
isnatotals <- sapply(df, function(x) sum(is.na(x)))
#Now Create the Row names
for (i in 1:length(colnames(df))) {
x[i,1] <<- (names(result[i]))
x[i,2] <<- result[[i]]
x[i,3] <<- isnatotals[[i]]
}
return(x)
}
Test:
DF <- data.frame(v1 = c(1,2,3,2), v2 = c("a","a","b","b"))
totaluniquevals(DF)
Row.Name TotalUnique IsNA
1 v1 3 0
2 v2 2 0
You can then use unique on whatever column, to see what the specific unique values are.
unique(DF$v2) [1] a b Levels: a b
Upvotes: 0
Reputation: 121618
Or using unique
:
rapply(DF,function(x)length(unique(x)))
v1 v2
3 2
Upvotes: 30
Reputation: 193687
Here's one approach:
> lapply(DF, function(x) length(table(x)))
$v1
[1] 3
$v2
[1] 2
This basically tabulates the unique values per column. Using length
on that tells you the number. Removing length
will show you the actual table of unique values.
Upvotes: 5