rg255
rg255

Reputation: 4169

create equation function acting across rows in R

I have a dataframe similar to the one this creates:

dummy=data.frame(c(1,2,3,4),c("a","b","c","d"));colnames(dummy)=c("Num","Let")
dummy$X1=rnorm(4,35,6)
dummy$X2=rnorm(4,35,6)
dummy$X3=rnorm(4,35,6)
dummy$X4=rnorm(4,35,6)
dummy$X5=rnorm(4,35,6)
dummy$X6=rnorm(4,35,6)
dummy$X7=rnorm(4,35,6)
dummy$X8=rnorm(4,35,6)
dummy$X9=rnorm(4,35,6)
dummy$X10=rnorm(4,35,6)
dummy$Xmax=apply(dummy[3:12],1,max)

only the real thing is 260*13000 cells roughly

what I aim to do is implement the equation below to each row in a set of columns defined by data[x:x] (in the example those within columns dummy[3:12])

TSP = Sum( (1-(Xi/Xmax)) /(n-1))

where Xi is each individual value within the row & among the columns of interest (i signifying each column, ie there is an X1, an X2, an X3... value for each row), Xmax is the largest of all those values in the row (as defined in the dummmy$Xmax column), and n is the number of columns selected (in the case of the example: n=10). In the actual data set I will be selecting 26 columns.

I would like to create a tidy little function which performs this calculation and deposits each row's value in to a column called dummy$TSP and does so for all 13000 rows.

One crude solution is the following, but like I said I would like to get this in to some kind of tidy function, where I can select the columns and the rest is (nearly) automatic.

dummy$TSP<- ((((1-(dummy$X1/dummy$Xmax))/(10-1))
            +(((1-(dummy$X2/dummy$Xmax))/(10-1))
                       ...
            +(((1-(dummy$X10/dummy$Xmax))/(10-1)))

I would also really appreciate answers which explain the process well so I will be more likely to be able to learn, thanks in advance!

Upvotes: 0

Views: 312

Answers (2)

Simon O&#39;Hanlon
Simon O&#39;Hanlon

Reputation: 60000

If you know the columns you want to apply the function over you can, as you suspect use apply to apply the function over the rows, on the columns you want like so;

# Columns you want to use for this function
cols <- c( 3:13 )

# Use apply to loop over rows
dummy$TSP <- apply( dummy[,cols] , 1 , FUN = function(x){ sum( ( 1 - ( x / max(x) ) ) / (length(x) - 1) ) } )

R is vectorised, so when we pass a row to the function in apply ( the row is passed as the argument x which will be a vector of 10 numbers), when we perform some operations R assumes that we want to do that operation on each element of the vector.

So in the first instance x/max(x) will return a vector of 10 numbers, which is an element from each column of that row / the maximum value in those columns for that row. We also divide each result of 1 - x/max(x) over the number of columns - 1. We then collate these into one value using sum which is returned from the function.

Upvotes: 1

adibender
adibender

Reputation: 7578

A more vectorized solution would be to perform the inner function over all elements and then perform the sum operation for each row with the efficient rowSums function like this:

vars.to.use <- paste0("X", 1:10)
dummy$TSP <- rowSums((1-(dummy[vars.to.use]/dummy$Xmax))/(length(vars.to.use) - 1))

Upvotes: 1

Related Questions