Reputation: 369
I recently started with are and I would like to scale my data matrix. I found a way to do that here Scale a series between two points
x <- data.frame(step = c(1,2,3,4,5,6,7,8,9,10))
normalized <- (x-min(x))/(max(x)-min(x))
As my data consists of several columns whereof I only want to normalize certain columns using a function was suggested.
normalized <- function(x) (x- min(x))/(max(x) - min(x))
x[] <- lapply(x, normalized)
Additionally, I realized that some of the data points in my dataset equal 0 such that the presented formula doesn't work anymore. I added an extension suggested here: scaling r dataframe to 0-1 with NA values
normalized <- function(x, ...) {(x - min(x, ...)) / (max(x, ...) - min(x, ...))}
But I don't understand how I have to code it. For example, I would like to have column 4,5,6 and 10 normalized but I would like to have the remaining columns as they were in the data set? I tried it for column 4:
data <- lapply(data[,4],normalized,na.rm= TRUE)
But it did not work (instead of a data frame a list resulted :-(...), does anybody knows how I could fix it?
Thanks a lot already in advance!
Upvotes: 5
Views: 14186
Reputation: 5017
Try this, I have modified normalized
function considering NA
values:
db<-data.frame(a=c(22,33,28,51,25,39,54,NA,50,66),
b=c(22,33,NA,51,25,39,54,NA,50,66))
normalized<-function(y) {
x<-y[!is.na(y)]
x<-(x - min(x)) / (max(x) - min(x))
y[!is.na(y)]<-x
return(y)
}
apply(db[,c(1,2)],2,normalized)
Your output:
a b
[1,] 0.00000000 0.00000000
[2,] 0.25000000 0.25000000
[3,] 0.13636364 NA
[4,] 0.65909091 0.65909091
[5,] 0.06818182 0.06818182
[6,] 0.38636364 0.38636364
[7,] 0.72727273 0.72727273
[8,] NA NA
[9,] 0.63636364 0.63636364
[10,] 1.00000000 1.00000000
Upvotes: 6