Mads Obi
Mads Obi

Reputation: 560

Normalization function in R

I have a matrix I want to transform, such that every feature in the transformed dataset has mean of 0 and variance of 1.

I have tried to use the following code:

scale <- function(train, test) 
{   
trainmean <- mean(train)
trainstd <- sd(train)
xout <- test
for (i in 1:length(train[1,])) {
    xout[,i] = xout[,i] - trainmean(i)
}
for (i in 1:lenght(train[1,])) {
    xout[,i] = xout[,i]/trainstd[i]
}

}
invisible(xout)

normalized <- scale(train, test)

This is, however, not working for me. Am I on the right track?

Edit: I am very new to the syntax!

Upvotes: 4

Views: 16469

Answers (2)

ClementWalter
ClementWalter

Reputation: 5272

Just suggesting another own written normalizing function avoiding apply with is from my experience slower than matrix computation:

m = matrix(rnorm(5000, 2, 3), 50, 100)

m_centred = m - m%*%rep(1,dim(m)[2])%*%rep(1, dim(m)[2])/dim(m)[2]
m_norm = m_centred/sqrt(m_centred^2%*%rep(1,dim(m)[2])/(dim(m)[2]-1))%*%rep(1,dim(m)[2])

## Verirication
rowMeans(m_norm)
apply(m_norm, 1, sd)

(Note that here row vectors are considered)

Upvotes: 2

jbaums
jbaums

Reputation: 27388

You can use the built-in scale function for this.

Here's an example, where we fill a matrix with random uniform variates between 0 and 1 and centre and scale them to have 0 mean and unit standard deviation:

m <- matrix(runif(1000), ncol=4)    
m_scl <- scale(m)

Confirm that the column means are 0 (within tolerance) and their standard deviations are 1:

colMeans(m_scl)
# [1] -1.549004e-16 -2.490889e-17 -6.369905e-18 -1.706621e-17

apply(m_scl, 2, sd)
# [1] 1 1 1 1

See ?scale for more details.

To write your own normalisation function, you could use:

my_scale <- function(x) {
  apply(m, 2, function(x) {
    (x - mean(x))/sd(x)
  }) 
}

m_scl <- my_scale(m)

or the following, which is probably faster on larger matrices

my_scale <- function(x) sweep(sweep(x, 2, colMeans(x)), 2, apply(x, 2, sd), '/')

Upvotes: 11

Related Questions