Froom2
Froom2

Reputation: 1279

Counting unique values across a row

I want to check that columns are consistent for each ID number (they're supposed to be constants, but there may be some doubt in the data, so I want to double check)

For example, given the following data frame:

test <- data.frame(ID = c("one","two","three"), 
a = c(1,1,1), 
b = c(1,1,1), 
t = c(NA,1,1), 
d = c(2,4,1))

I want to check that columns a,b,c and d are all the same, disregarding missing values. I thought I could do this by counting the unique values in the relevant columns, so then I can select only the rows where the number of unique values is more than 1... I imagine this is likely not the best way of doing that, but it was the only way I could think with my limited knowledge.

I found this question here, which seems to be similar to what I want to do: Find unique values across a row of a data frame

But I am struggling to apply the answers to my data. I have tried this, which didn't do anything (but I've never used a for-loop before, so I've probably done that wrong), although when I run the inside of the function on it's own for a single row it does exactly what I hope for:

yeartest <- function(x){
  temp <- test[x,2:5]
  temp <- as.numeric(temp)
  veclength <- length(unique(temp[!is.na(temp)]))
  temp2 <- c(temp,veclength)
  test[,"thing"] <- NA
  test[x,2:6] <- temp2
}

for(i in 1:nrow(test)){
  yeartest(i)
}

Then I tried from the accepted answer, to apply that:

x <- test
# dups <- function(x) x[!duplicated(x)]
yeartest <- function(x){
  #   x <- 1
  temp <- test[x,2:5]
  temp <- as.numeric(temp)
  veclength <- length(unique(temp[!is.na(temp)]))
  temp2 <- c(temp,veclength)
  test[,"thing"] <- NA
  test[x,2:6] <- temp2
}

new.df <- t(apply(x, 1, function(x) yeartest(x)))

Which gives an error and so it is pretty obvious that I have made a mistake in my translation of the answer to my data.

Apologies, this must be a really obvious failing on my part, I am very grateful for any help.

Solution: (thank you for the help!)

test$new <- apply(test[,2:5],1,function(r) length(unique(na.omit(r))))

Upvotes: 1

Views: 1147

Answers (1)

Raffael
Raffael

Reputation: 20045

> df <- data.frame(
    a=sample(2,10,replace=TRUE),
    b=sample(2,10,replace=TRUE),
    c=sample(c("a","b"),10,replace=TRUE),
    d=sample(c("a","b"),10,replace=TRUE))

> df[c(3,6,8),1] <- NA

> df
    a b c d
1   1 2 a b
2   1 2 a b
3  NA 2 a a
4   2 2 a b
5   1 2 a a
6  NA 1 a b
7   2 1 b b
8  NA 1 a a
9   1 1 b b
10  2 2 b b

> apply(df,1,function(r) length(unique(na.omit(r))))
 [1] 3 3 2 4 3 2 4 2 3 3

Upvotes: 3

Related Questions