Reputation: 4243

Normalize by set standard deviation from mean of every column (excluding first)

I have a dataset below:

  A       B     C      D
500       2     4      6
501       6     8     45
502       4     7      9

How do I normalize every column excluding the first to be normalized and have a set standard deviation from the mean of each column.

So for example below are the means for each column:

B = 4
C = 6.333
D = 20

I then want to normalize with the bounds to be no greater than 25% of the mean in either direction.

I think you can do it with rescale but I just don't know how to apply it to all columns:

library(scales)
rescale(x, to = c(mean - 0.25*mean, mean + 0.25*mean)

I know this is a way to do it but it doesn't take into account the bounds and the standard deviation set of 25%:

normalized <- function(x){
  return((x-min(x)) / (max(x)-min(x)))
}

normalized_dataset<-df %>% 
  mutate_at(vars(-one_of("A")), normalized)

Upvotes: 1

Answers (3)

moodymudskipper

Reputation: 47350

Would this work ?

df <- read.table(text="
  A       B     C      D
500       2     4      6
501       6     8     45
502       4     7      9",h=T)

df2 <- df
df2[-1] <- lapply(df[-1],function(x) mean(x) +(x-mean(x)) * 0.25*mean(x)/max(abs(x-mean(x))))

#     A B        C    D
# 1 500 3 4.750000 17.2
# 2 501 5 7.464286 25.0
# 3 502 4 6.785714 17.8

The mean stays the same for each relevant column, but values are rescaled so that the furthest value from the mean is at a mean*25% distance from it.

Upvotes: 1

onlyphantom

Reputation: 9613

If you already have code that does what you need but struggle to apply it to all columns except the first, try the simple base R approach.

Your function:

## your rescale function
fun1 <- function(x){
    return(  scales::rescale(x, to = c(mean(x) - 0.25*mean(x), mean(x) + 0.25*mean(x))))
}

Apply to all columns except the first:

dat[2:4] <- lapply(dat[2:4], fun1)

Upvotes: 1

Rui Barradas

Reputation: 76673

I hope function rescale comes from package scales.

This is a typical example of the use of the *apply family of functions.
I will work on a copy of the data and rescale the copy, if you don't want to keep the original, it's a simple matter to modify the code below.

dat2 <- dat

dat2[-1] <- lapply(dat2[-1], function(x)
    scales::rescale(x, to = c(mean(x) - 0.25*mean(x), mean(x) + 0.25*mean(x))))

dat2
#    A B        C        D
#1 500 3 4.750000 15.00000
#2 501 5 7.916667 25.00000
#3 502 4 7.125000 15.76923

Data.

dat <- read.table(text = "
  A       B     C      D
500       2     4      6
501       6     8     45
502       4     7      9 
", header = TRUE)

Upvotes: 1

Normalize by set standard deviation from mean of every column (excluding first)

Answers (3)

Related Questions