GaryHill
GaryHill

Reputation: 115

R boxplot - how to normalize against given high and low limits instead of min and max

I have several measurements, which need to be presented in same boxplot chart, despite having completely different scales. Each group (=measurement type) has their own specific high and low acceptance limits.

The data should be normalized in R so that the low limit is always -1, and high limit always +1 across all groups. I'll then set the Y-axis so that all measurements are properly displayed.

So far I've managed to draw the boxplot with min(NUM_VALUE) being -1 and max(NUM_VALUE) being +1, but this is not the final result I want.

fake data (just part of the table):

ITEMID      NAME    SERIALID    NUM_VALUE   LOWER_LIMIT UPPER_LIMIT
Itemcode1   group1  SN1000      62.1        50          80
Itemcode1   group1  SN1001      62.6        50          80
Itemcode1   group1  SN1002      63.9        50          80
Itemcode1   group2  SN1006      1526.79     1526        1528
Itemcode1   group2  SN1007      1526.799    1526        1528
Itemcode1   group3  SN1015      1815.09     1814        1816
Itemcode1   group3  SN1016      1815.094    1814        1816
Itemcode1   group3  SN1017      1815.098    1814        1816
Itemcode1   group4  SN1025      1526.751    1526        1527
Itemcode1   group4  SN1026      1526.62     1526        1527
Itemcode1   group5  SN1028      1816.155    1816        1817
Itemcode1   group5  SN1029      1816.245    1816        1817

R code:

library(ggplot2)
library(data.table)
df <- read.table("data3.csv", header=TRUE, sep=";", stringsAsFactors=FALSE)
skl <- function(x){(x-min(x))/(max(x)-min(x))*2-1}
df <- transform(df,scaled=ave(df$NUM_VALUE,df$NAME,FUN=skl))
ggplot(df, aes(x=df$NAME, y = df$scaled)) + geom_boxplot()

Graph so far: boxplot

I'm very new to R.

Question: How to scale boxplot against UPPER_LIMIT and LOWER_LIMIT by group and present it all in same graph?

Any help highly appreciated, thank you!

Upvotes: 1

Views: 2915

Answers (1)

clemens
clemens

Reputation: 6813

Instead of using min() and max(), you can change your function skl() to also take lower and upper bounds that are used instead.

The adapted function looks like this:

skl <- function(x, lower, upper){
  (x - lower)/(upper - lower) * 2 - 1
}

You can than go through the rows of your data.frame using apply():

df$scaled <- apply(df[, 4:6], 1, function(row) {
  skl(x = row[1], lower = row[2], upper = row[3])
})

The result looks like this:

df$scaled
 [1] -0.19333333 -0.16000000 -0.07333333 -0.21000000 -0.20100000  0.09000000  0.09400000  0.09800000
 [9]  0.50200000  0.24000000 -0.69000000 -0.51000000

Using your code, the boxplot will look like this:

library(ggplot2)
ggplot(df, aes(x=df$NAME, y = df$scaled)) + geom_boxplot()

boxplot

Upvotes: 2

Related Questions