pranav nerurkar
pranav nerurkar

Reputation: 648

Count unique elements in columns of dataframe

For the dataframe below, the there are 59 columns

circleid  name  birthday  56 more...
1         1    1       
2         2    10
2         5     68
2         1    10
1         1    1

Result I want

circleid  distinct_name  distinct_birthday  56 more...
1         1              1       
2         3              2


quiz <- read.csv("https://raw.githubusercontent.com/pranavn91/PhD/master/Expt/circles-removed-na.csv", header = T)

So far

ddply(quiz,~circleid,summarise,number_of_distinct_name=length(unique(name)))

This works for 1 column how do i get the full dataframe

columns <- colnames(quiz)

for (i in c(1:58)
{
final <- ddply(quiz,~circleid,summarise,number_of_distinct_name=length(unique(columns[i])))


}

Upvotes: 0

Views: 68

Answers (3)

pieca
pieca

Reputation: 2563

with data.table you can run:

library(data.table)
quiz <- fread("https://raw.githubusercontent.com/pranavn91/PhD/master/Expt/circles-removed-na.csv", header = T)
unique_vals <- quiz[, lapply(.SD, uniqueN), by = circleid]

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76402

With package dplyr this is simple. The original answer had length(unique(.)) but @akrun pointed me to n_distinct(.) in a comment.

library(dplyr)

quiz %>%
  group_by(circleid) %>%
  summarise_all(n_distinct)
## A tibble: 2 x 3
#circleid  name birthday
#<int>    <int>    <int>
#  1        1     1
#  2        2     3

Data.

quiz <- read.table(text = "
circleid  name  birthday
1         1    1       
2         2    10
2         5     68
2         1    10
1         1    1
", header = TRUE)

Upvotes: 1

jyjek
jyjek

Reputation: 2707

You can use dplyr:

result<-quiz%>%
  group_by(circleid)%>%
  summarise_all(n_distinct)

microbenchmark for data.table and dplyr:

 microbenchmark(x1=quiz[, lapply(.SD, function(x) length(unique(x))), by = circleid],
                x2=quiz%>%
                  group_by(circleid)%>%
                  summarise_all(n_distinct),times=100)
Unit: milliseconds
 expr       min        lq      mean    median        uq       max neval cld
   x1 150.06392 155.02227 158.75775 156.49328 158.38887 224.22590   100   b
   x2  41.07139  41.90953  42.95186  42.54135  43.97387  49.91495   100  a 

Upvotes: 1

Related Questions