Reputation: 4926
I am trying to normalize some columns on a data frame so they have the same mean. The solution I am now implementing, even though it works, feels like there is a simpler way of doing this.
# we make a copy of women
w = women
# print out the col Means
colMeans(women)
height weight
65.0000 136.7333
# create a vector of factors to normalize with
factor = colMeans(women)/colMeans(women)[1]
# normalize the copy of women that we previously made
for(i in 1:length(factor)){w[,i] <- w[,i] / factor[i]}
#We achieved our goal to have same means in the columns
colMeans(w)
height weight
65 65
I can come up with the same thing easily ussing apply
but is there something easier like just doing women/factor
and get the correct answer?
By the way, what does women/factor
actually doing? as doing:
colMeans(women/factor)
height weight
49.08646 98.40094
Is not the same result.
Upvotes: 0
Views: 106
Reputation: 886938
Also:
rowMeans(t(women)/factor)
#height weight
#65 65
Regarding your question:
I can come up with the same thing easily ussing apply but is there something easier like just doing women/factor and get the correct answer? By the way, what does women/factor actually doing?
women/factor ## is similar to
unlist(women)/rep(factor,nrow(women))
What you need is:
unlist(women)/rep(factor, each=nrow(women))
or
women/rep(factor, each=nrow(women))
In my solution, I didn't use rep
because factor
gets recycled as needed.
t(women) ##matrix
as.vector(t(women))/factor #will give same result as above
or just
t(women)/factor #preserve the dimensions for ?rowMeans
In short, column wise operations are happening here.
Upvotes: 1
Reputation: 92282
Can use mapply
too
colMeans(mapply("/", w, factor))
Re your question re what does women/factor
do, so women
is a data.frame
with two columns, while factor
is numeric vector of length two. So when you do women/factor
, R takes each entry of women
(i.e. women[i,j]
) and divides it once by factor[1]
and then factor[2]
. Because factor is shorter in length than women
, R rolls factor
over and over again.
You can see, for example, that every second entry of women[, 1]/factor
equals to every second entry of women[, 1]
(because factor[1]
equals to 1)
Upvotes: 1
Reputation: 3242
One way of doing this is using sweep
. By default this function subtracts a summary statistic from each row, but you can also specify a different function to perform. In this case a division:
colMeans(sweep(women, 2, factor, '/'))
Upvotes: 1