Reputation: 1032
I would like to know how can I do median centering in R for my dataset. I have done that in two ways but I am not sure about them if they are correct or not.
here are my codes:
simple dataset:
set.seed(100)
df = (matrix(rnorm(20), 5, 4))
df
[,1] [,2] [,3] [,4]
[1,] -0.50219235 0.3186301 0.08988614 -0.02931671
[2,] 0.13153117 -0.5817907 0.09627446 -0.38885425
[3,] -0.07891709 0.7145327 -0.20163395 0.51085626
[4,] 0.88678481 -0.8252594 0.73984050 -0.91381419
[5,] 0.11697127 -0.3598621 0.12337950 2.31029682
by using scale function: (which I've read in some forums)
scale(df,center = T)
[,1] [,2] [,3] [,4]
[1,] -1.21733894 0.7238418 -0.2307246 -0.2640103
[2,] 0.04109693 -0.6766529 -0.2122225 -0.5541570
[3,] -0.37680714 1.3396200 -1.0750402 0.1719093
[4,] 1.54086497 -1.0553388 1.6517068 -0.9777997
[5,] 0.01218417 -0.3314701 -0.1337194 1.6240577
by subtracting the row-median from each entries in whole data.frame
df - rowMedians(df)
[,1] [,2] [,3] [,4]
[1,] -0.532477068 0.2883454 0.059601427 -0.05960143
[2,] 0.277821059 -0.4355008 0.242564354 -0.24256435
[3,] -0.294886674 0.4985631 -0.417603536 0.29488667
[4,] 0.929494272 -0.7825500 0.782549963 -0.87110472
[5,] -0.003204115 -0.4800375 0.003204115 2.19012144
but these two results are not the same which makes me confuse, if I used the right function to do it or now.
I do appreciate your help if you could help me out with this problem or give me more suggestions.
Best,
Upvotes: 1
Views: 10066
Reputation:
You can create a function and try as follows:
median_center <- function(x) {
apply(x, 2, function(y) y - median(y))
}
# apply it
median_center(df)
Upvotes: 0
Reputation: 70
This can be achieved using the center
argument of the base scale
function e.g. scale(x, center = median(x), scale = F)
(or scale = T
if you also want to scale your data)
Upvotes: 0
Reputation: 103
Row medians can be obtained by:
rowmed <- apply(df,1,median)
Then you can simply subtract the row medians from the rows:
df - rowmed
Upvotes: 1