Reputation: 612
I have these loops :
xall = data.frame()
for (k in 1:nrow(VectClasses))
{
for (i in 1:nrow(VectIndVar))
{
xall[i,k] = sum(VectClasses[k,] == VectIndVar[i,])
}
}
The data:
VectClasses = Data Frame containing the characteristics of each classes
VectIndVar = Data Frame containing each record of the data base
The two for loops work and give an output I can work with, however, it takes too long, hence my need for the apply family
The output I am looking for is as this:
V1 V2 V3 V4
1 3 3 2 2
2 2 2 1 1
3 3 4 3 3
4 3 4 3 3
5 4 4 3 3
6 3 2 3 3
I tried using :
xball = data.frame()
xball = sapply(xball, function (i,k){
sum(VectClasses[k,] == VectIndVar[i,])})
xcall = data.frame()
xcall = lapply(xcall, function (i, k){sum(VectClasses[k,] == VectIndVar[i,]} )
but neither seems to be filling the dataframe
reproductible data (shortened):
VectIndVar <- data.frame(a=sample(letters[1:5], 100, rep=T), b=floor(runif(100)*25),
c = sample(c(1:5), 100, rep=T),
d=sample(c(1:2), 100, rep=T))
and :
> K1 = 4
VectClasses= VectIndVar [sample(1:nrow(VectIndVar ), K1, replace=FALSE), ]
Can you help me?
Upvotes: 3
Views: 4838
Reputation: 66819
I would use outer
instead of *apply
:
res <- outer(
1:nrow(VectIndVar),
1:nrow(VectClasses),
Vectorize(function(i,k) sum(VectIndVar[i,-1]==VectClasses[k,-1]))
)
(Thanks to this Q&A for clarifying that Vectorize
is needed.)
This gives
> head(res) # with set.seed(1) before creating the data
[,1] [,2] [,3] [,4]
[1,] 1 1 2 1
[2,] 0 0 1 0
[3,] 0 0 0 0
[4,] 0 0 1 0
[5,] 1 0 0 1
[6,] 1 1 1 1
As for speed, I would suggest using matrices instead of data.frames:
cmat <- as.matrix(VectClasses[-1]); rownames(cmat)<-VectClasses$a
imat <- as.matrix(VectIndVar[-1]); rownames(imat)<-VectIndVar$a
Upvotes: 6