Reputation: 265
I have a large dataset with repeated measures over 5 time periods.
2012 2009 2006 2003 2000
3 1 4 4 1
5 3 2 2 3
6 7 3 5 6
I want to add a new column, which is the number of unique values among years 2000 to 2012. e.g.,
2012 2009 2006 2003 2000 nunique
3 1 4 4 1 3
5 3 2 2 3 3
6 7 3 5 6 4
I am working in R and, if it helps, there are only 14 possible different values of the measured value at each time period.
I found this page: Count occurrences of value in a set of variables in R (per row) and tried the various solutions offered on it. What it gives me however is a count of each value, not the number of unique values. Other similar questions on here seem to ask about counting number of unique values within a variable /column, rather than across each row. Any suggestions would be appreciated.
Upvotes: 1
Views: 2743
Reputation: 1021
If you have a large dataset, you may want to avoid looping over the rows, but use a faster framework, like S4Vectors:
df <- data.frame('2012'=c(3,5,6),
'2009'=c(1,3,7),
'2006'=c(4,2,3),
'2003'=c(4,2,5),
'2000'=c(1,3,6))
dup <- S4Vectors:::duplicatedIntegerPairs(as.integer(as.matrix(df)), row(df))
dim(dup) <- dim(df)
rowSums(!dup)
Or, the matrixStats package:
m <- as.matrix(df)
mode(m) <- "integer"
rowSums(matrixStats::rowTabulates(m) > 0)
Upvotes: 1
Reputation: 106
The trick is to use 'apply' and assign each row to a variable (e.g. x). You can then write a custom function, in this case one that uses 'unique' and 'length' to get the answer that you want.
df <- data.frame('2012'=c(3,5,6), '2009'=c(1,3,7), '2006'=c(4,2,3), '2003'=c(4,2,5), '2000'=c(1,3,6))
df$nunique = apply(df, 1, function(x) {length(unique(x))})
Upvotes: 0
Reputation: 61214
Here's one alternative
> df$nunique <- apply(df, 1, function(x) length(unique(x)))
> df
2012 2009 2006 2003 2000 nunique
1 3 1 4 4 1 3
2 5 3 2 2 3 3
3 6 7 3 5 6 4
Upvotes: 2