user3251223
user3251223

Reputation: 265

Counting unique values across variables (columns) in R

I have a large dataset with repeated measures over 5 time periods.

   2012  2009  2006  2003  2000
    3     1     4     4     1
    5     3     2     2     3
    6     7     3     5     6 

I want to add a new column, which is the number of unique values among years 2000 to 2012. e.g.,

   2012  2009  2006  2003  2000  nunique
    3     1     4     4     1      3
    5     3     2     2     3      3
    6     7     3     5     6      4

I am working in R and, if it helps, there are only 14 possible different values of the measured value at each time period.

I found this page: Count occurrences of value in a set of variables in R (per row) and tried the various solutions offered on it. What it gives me however is a count of each value, not the number of unique values. Other similar questions on here seem to ask about counting number of unique values within a variable /column, rather than across each row. Any suggestions would be appreciated.

Upvotes: 1

Views: 2743

Answers (4)

Mudit Gupta
Mudit Gupta

Reputation: 441

try this one out:

sapply(data, function(x) length(unique(x)))

Upvotes: 0

Michael Lawrence
Michael Lawrence

Reputation: 1021

If you have a large dataset, you may want to avoid looping over the rows, but use a faster framework, like S4Vectors:

df <- data.frame('2012'=c(3,5,6),
             '2009'=c(1,3,7),
             '2006'=c(4,2,3),
             '2003'=c(4,2,5),
             '2000'=c(1,3,6))

dup <- S4Vectors:::duplicatedIntegerPairs(as.integer(as.matrix(df)), row(df))
dim(dup) <- dim(df)
rowSums(!dup)

Or, the matrixStats package:

m <- as.matrix(df)
mode(m) <- "integer"
rowSums(matrixStats::rowTabulates(m) > 0)

Upvotes: 1

Sam Danziger
Sam Danziger

Reputation: 106

The trick is to use 'apply' and assign each row to a variable (e.g. x). You can then write a custom function, in this case one that uses 'unique' and 'length' to get the answer that you want.

df <- data.frame('2012'=c(3,5,6), '2009'=c(1,3,7), '2006'=c(4,2,3), '2003'=c(4,2,5), '2000'=c(1,3,6))

df$nunique = apply(df, 1, function(x) {length(unique(x))})

Upvotes: 0

Jilber Urbina
Jilber Urbina

Reputation: 61214

Here's one alternative

> df$nunique <- apply(df, 1, function(x) length(unique(x)))
> df
  2012 2009 2006 2003 2000 nunique
1    3    1    4    4    1       3
2    5    3    2    2    3       3
3    6    7    3    5    6       4

Upvotes: 2

Related Questions