Reputation: 165
I want to know the total number of unique values for each column based on the values of var_1.
For example:
Test <- data.frame(var_1 = c("a","a","a", "b", "b", "c", "c", "c", "c", "c"), var_2 = c("bl","bf","bl", "bl","bf","bl","bl","bf","bc", "bg" ), var_3 = c("cf","cf","eg", "cf","cf","eg","cf","dr","eg","fg"))
The results I am looking for would be based on the values in var_1 and should be:
var_1 var_2 var_3
a 2 2
b 2 1
c 3 4
However, after trying various methods (including apply and table) - aggregate has been the closest thing to what I am looking for, but this script results in a summary of the total number of entries for each value of var_1, but the total is not unique
agbyv1= aggregate(. ~ var_1, Test, length)
var_1 var_2 var_3
a 3 3
b 2 2
c 5 5
I tried
unqbyv1= aggregate(. ~ var_1, Test, length(unique(x)))
but that didn't work.
Any help is greatly appreciated.
Upvotes: 5
Views: 10091
Reputation: 887951
Try
library(dplyr)
Test %>%
group_by(var_1) %>%
summarise_each(funs(n_distinct(.)))
Or
library(data.table)#v1.9.5+
setDT(Test)[, lapply(.SD, uniqueN), var_1]
If there are NAs
setDT(Test)[, lapply(.SD, function(x) uniqueN(na.omit(x))), var_1]
Or you can use aggregate
. By default, the na.action=na.omit
. So, we don't need any modifications.
aggregate(.~ var_1, Test, FUN=function(x) length(unique(x)) )
Upvotes: 8
Reputation: 667
Try This:
apply(Test[-1] , 2 , function(y) tapply(y,Test$var_1,function(x) length(unique(x))))
Upvotes: 0