Reputation: 1279
I want to check that columns are consistent for each ID number (they're supposed to be constants, but there may be some doubt in the data, so I want to double check)
For example, given the following data frame:
test <- data.frame(ID = c("one","two","three"),
a = c(1,1,1),
b = c(1,1,1),
t = c(NA,1,1),
d = c(2,4,1))
I want to check that columns a,b,c and d are all the same, disregarding missing values. I thought I could do this by counting the unique values in the relevant columns, so then I can select only the rows where the number of unique values is more than 1... I imagine this is likely not the best way of doing that, but it was the only way I could think with my limited knowledge.
I found this question here, which seems to be similar to what I want to do: Find unique values across a row of a data frame
But I am struggling to apply the answers to my data. I have tried this, which didn't do anything (but I've never used a for-loop before, so I've probably done that wrong), although when I run the inside of the function on it's own for a single row it does exactly what I hope for:
yeartest <- function(x){
temp <- test[x,2:5]
temp <- as.numeric(temp)
veclength <- length(unique(temp[!is.na(temp)]))
temp2 <- c(temp,veclength)
test[,"thing"] <- NA
test[x,2:6] <- temp2
}
for(i in 1:nrow(test)){
yeartest(i)
}
Then I tried from the accepted answer, to apply that:
x <- test
# dups <- function(x) x[!duplicated(x)]
yeartest <- function(x){
# x <- 1
temp <- test[x,2:5]
temp <- as.numeric(temp)
veclength <- length(unique(temp[!is.na(temp)]))
temp2 <- c(temp,veclength)
test[,"thing"] <- NA
test[x,2:6] <- temp2
}
new.df <- t(apply(x, 1, function(x) yeartest(x)))
Which gives an error and so it is pretty obvious that I have made a mistake in my translation of the answer to my data.
Apologies, this must be a really obvious failing on my part, I am very grateful for any help.
Solution: (thank you for the help!)
test$new <- apply(test[,2:5],1,function(r) length(unique(na.omit(r))))
Upvotes: 1
Views: 1147
Reputation: 20045
> df <- data.frame(
a=sample(2,10,replace=TRUE),
b=sample(2,10,replace=TRUE),
c=sample(c("a","b"),10,replace=TRUE),
d=sample(c("a","b"),10,replace=TRUE))
> df[c(3,6,8),1] <- NA
> df
a b c d
1 1 2 a b
2 1 2 a b
3 NA 2 a a
4 2 2 a b
5 1 2 a a
6 NA 1 a b
7 2 1 b b
8 NA 1 a a
9 1 1 b b
10 2 2 b b
> apply(df,1,function(r) length(unique(na.omit(r))))
[1] 3 3 2 4 3 2 4 2 3 3
Upvotes: 3