Benoit_Plante
Benoit_Plante

Reputation: 729

Unique values in each of the columns of a data frame

I want to get the number of unique values in each of the columns of a data frame. Let's say I have the following data frame:

DF <- data.frame(v1 = c(1,2,3,2), v2 = c("a","a","b","b"))

then it should return that there are 3 distinct values for v1, and 2 for v2.

I tried unique(DF), but it does not work as each rows are different.

Upvotes: 30

Views: 83661

Answers (8)

Waldi
Waldi

Reputation: 41260

In dplyr (>=1.0.0 - june 2020):

DF %>% summarize_all(n_distinct)

  v1 v2
1  3  2

Upvotes: 1

Jigar Pabari
Jigar Pabari

Reputation: 1

This should work for getting an unique value for each variable:

length(unique(datasetname$variablename))

Upvotes: 0

Uwe
Uwe

Reputation: 42592

For the sake of completeness: Since CRAN version 1.9.6 of 19 Sep 2015, the data.table package includes the helper function uniqueN() which saves us from writing

function(x) length(unique(x))

when calling one of the siblings of apply():

sapply(DF, data.table::uniqueN)
v1 v2 
 3  2

Note that neither the data.table package needs to be loaded nor DF coerced to class data.table in order to use uniqueN(), here.

Upvotes: 1

Nirali Khoda
Nirali Khoda

Reputation: 388

This will give you unique values in DF dataframe of column 1.

unique(sc_data[,1])

Upvotes: -3

petergensler
petergensler

Reputation: 342

I think a function like this would give you what you are looking for. This also shows the unique values, in addition to how many NA's there are in each dataframe's columns. Simply plug in your dataframe, and you are good to go.

totaluniquevals <- function(df) {
  x <<- data.frame("Row Name"= numeric(0), "TotalUnique"=numeric(0), "IsNA"=numeric(0))
  result <- sapply(df, function(x) length(unique(x)))
  isnatotals <- sapply(df, function(x) sum(is.na(x)))

  #Now Create the Row names
  for (i in 1:length(colnames(df))) {
    x[i,1] <<- (names(result[i]))
    x[i,2] <<- result[[i]]
    x[i,3] <<- isnatotals[[i]]

  }
  return(x)
}

Test:

DF <- data.frame(v1 = c(1,2,3,2), v2 = c("a","a","b","b"))
totaluniquevals(DF)
  Row.Name TotalUnique IsNA
1       v1           3    0
2       v2           2    0

You can then use unique on whatever column, to see what the specific unique values are.

unique(DF$v2) [1] a b Levels: a b

Upvotes: 0

ben_says
ben_says

Reputation: 2513

sapply(DF, function(x) length(unique(x)))

Upvotes: 9

agstudy
agstudy

Reputation: 121618

Or using unique:

rapply(DF,function(x)length(unique(x)))
v1 v2 
 3  2 

Upvotes: 30

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193687

Here's one approach:

> lapply(DF, function(x) length(table(x)))
$v1
[1] 3

$v2
[1] 2

This basically tabulates the unique values per column. Using length on that tells you the number. Removing length will show you the actual table of unique values.

Upvotes: 5

Related Questions