Reputation: 648
For the dataframe below, the there are 59 columns
circleid name birthday 56 more...
1 1 1
2 2 10
2 5 68
2 1 10
1 1 1
Result I want
circleid distinct_name distinct_birthday 56 more...
1 1 1
2 3 2
quiz <- read.csv("https://raw.githubusercontent.com/pranavn91/PhD/master/Expt/circles-removed-na.csv", header = T)
So far
ddply(quiz,~circleid,summarise,number_of_distinct_name=length(unique(name)))
This works for 1 column how do i get the full dataframe
columns <- colnames(quiz)
for (i in c(1:58)
{
final <- ddply(quiz,~circleid,summarise,number_of_distinct_name=length(unique(columns[i])))
}
Upvotes: 0
Views: 68
Reputation: 2563
with data.table
you can run:
library(data.table)
quiz <- fread("https://raw.githubusercontent.com/pranavn91/PhD/master/Expt/circles-removed-na.csv", header = T)
unique_vals <- quiz[, lapply(.SD, uniqueN), by = circleid]
Upvotes: 1
Reputation: 76402
With package dplyr
this is simple. The original answer had length(unique(.))
but @akrun pointed me to n_distinct(.)
in a comment.
library(dplyr)
quiz %>%
group_by(circleid) %>%
summarise_all(n_distinct)
## A tibble: 2 x 3
#circleid name birthday
#<int> <int> <int>
# 1 1 1
# 2 2 3
Data.
quiz <- read.table(text = "
circleid name birthday
1 1 1
2 2 10
2 5 68
2 1 10
1 1 1
", header = TRUE)
Upvotes: 1
Reputation: 2707
You can use dplyr
:
result<-quiz%>%
group_by(circleid)%>%
summarise_all(n_distinct)
microbenchmark
for data.table
and dplyr
:
microbenchmark(x1=quiz[, lapply(.SD, function(x) length(unique(x))), by = circleid],
x2=quiz%>%
group_by(circleid)%>%
summarise_all(n_distinct),times=100)
Unit: milliseconds
expr min lq mean median uq max neval cld
x1 150.06392 155.02227 158.75775 156.49328 158.38887 224.22590 100 b
x2 41.07139 41.90953 42.95186 42.54135 43.97387 49.91495 100 a
Upvotes: 1