Evan Zamir
Evan Zamir

Reputation: 8481

For loop not counting correctly

I can't for the life of me figure out what is going on here. I have a data frame that has several thousands rows. One of the columns is "name" and the other columns have various factors. I'm trying to count how many unique rows (i.e. sets of factors) belong to each "name".

Here is the loop that I am running as a script:

names<-as.matrix(unique(all.rows$name))
count<-matrix(1:length(names))
for (i in 1:length(names)) {
  count[i]<-dim(unique(subset(all.rows,name==names[i])[,c(1,3,4,5)]))[1]
}

When I run the line in the for loop from the console and replace "i" with an arbitrary number (i.e. 10, 27, 40, ...), it gives me the correct count. But when I run this line inside the for loop, the end result is that the counts are all the same. I can't figure out why it's not working. Any ideas?

Upvotes: 0

Views: 465

Answers (2)

nograpes
nograpes

Reputation: 18323

Your code works for me:

# Sample data.
set.seed(1)
n=10000
all.rows=data.frame(a=sample(LETTERS,n,replace=T),b=sample(LETTERS,n,replace=T),name=sample(LETTERS,n,replace=T))

names<-as.matrix(unique(all.rows$name))
count<-matrix(1:length(names))
for (i in 1:length(names)) {
  count[i]<-dim(unique(subset(all.rows,name==names[i])[,c(1,2)]))[1]
}
t(count)

If you want to stick with a for loop, this is a little more clear:

count<-c()
for (i in unique(all.rows$name)) 
  count[i]<-nrow(unique(all.rows [all.rows$name==i,names(all.rows)!='name']))
count

But using by would be very concise:

c(by(all.rows,all.rows$name,function(x) nrow(unique(x))))

Upvotes: 2

user697473
user697473

Reputation: 2293

You can do this with much simpler code. Try just pasting together the factor values in each row and then using tapply. Here is a working example:

data(trees)
trees$name <- rep(c('elm', 'oak'), length.out = nrow(trees))
trees$HV   <- with(trees, paste(Height, Volume))
tapply(trees$HV, trees$name, function (x) length(unique(x)))

The last command gives you the counts that you need. As far as I can tell, the analogous code given your variable names is

all.rows$factorCombo <- apply(all.rows[, c(1, 3:5)], 2, function (x) paste(x, collapse = ''))
tapply(all.rows$factorCombo, all.rows$name, function (x) length(unique(x)))

Upvotes: 2

Related Questions