Reputation: 1
I am working on a project and I need my function to be as fast as possible since I have millions of datapoints slowing down my calculations. I feel my problem is quite simple, but I haven't been able to find an efficient solution so far. A simplified version of the problem is the following. Consider that you have the following values:
values_test<-c(-8, -9, 2, 1, 1,8,6,2)
And you have data of the form
c1<-c(1,2,3,4,85,6,9,3,7,7,8,9,7,9,5)
c2<-c(4,6,7,6,3,7,21,79,45,63,4,9,5,7,2)
c3<-c(8,9,21,4,9,6,5,6,3,7,12,7,3,6,7)
c4<-c(11,7,2,9,8,7,6,1,7,9,1,4,8,3,0)
c5<-c(18,2,42,47,1,7,5,5,7,9,11,96,34,63,71)
data<-cbind(c1,c2,c3,c4,c5)
where every column of the data is a variable, and every line is a person. In this example I would have 5 variables, but in real life I may have n variables (whatever positive number of variables as my table changes size).
I would need to use the values_test and multiply them for a respective column of my data object. In my case, the number of values_test changes as well, and it is related to the number of variables available. For instance, consider the example where I need to take the 5th value_test and multiply it by the fourth variable, and the sixth value_test needs to be multiplied by the 5th column. I could do this mannually with a code like
value_person <- values_test[5] * data[,4]+values_test[6]*data[,5]
Although that seems easy, it does not work for me because I do not know how many variables I will have. For instance, if a table includes one more covariate, then my "data" dataframe will have six columns and not five, I would have one more value in values_test. Then, I would need to do
value_person <- values_test[5] * data[,4]+values_test[6]*data[,5]+values_test[7]*data[,6]
and the sum should include more and more terms as the variables in a given table increase. Is there a way to do such an operation without using a for loop?
I thought for instance of something such as
n_col<-ncol(data)
number_variables<-n_col-3##operation does not include 3 first variables
value_person<-rowSums(values_test[(4+1):(4+number_variables)]*data[,4:n_col])
Sadly this does not work because it alternates the value_test used in a column (first row of colum is multiplied by 1 and the second row of the same column by 8, but the values_test should be fixed for a fixed column -in the previous example it should always be 1 for data column 4 and 8 for data column 5).
I do want to avoid having a for loop.
Any help is appreciated!
Upvotes: 0
Views: 76
Reputation: 101753
Try tcrossprod
> c(tcrossprod(values_test[5:(ncol(data) + 1)], data[, -(1:3)]))
[1] 155 23 338 385 16 63 46 41 63 81 89 772 280 507 568
Upvotes: 2
Reputation: 79238
m <- seq(4, ncol(data))
data[,m] %*% values_test[m + 1]
[,1]
[1,] 155
[2,] 23
[3,] 338
[4,] 385
[5,] 16
[6,] 63
[7,] 46
[8,] 41
[9,] 63
[10,] 81
[11,] 89
[12,] 772
[13,] 280
[14,] 507
[15,] 568
colSums(t(data[,m]) * values_test[m+1])
[1] 155 23 338 385 16 63 46 41 63 81 89 772 280 507 568
Upvotes: 1