pedrosaurio
pedrosaurio

Reputation: 4926

how to transform columns of a data frame according to the values in a vector in R?

I am trying to normalize some columns on a data frame so they have the same mean. The solution I am now implementing, even though it works, feels like there is a simpler way of doing this.

# we make a copy of women
w = women
# print out the col Means
colMeans(women)
height   weight 
65.0000 136.7333
# create a vector of factors to normalize with
factor = colMeans(women)/colMeans(women)[1]
# normalize the copy of women that we previously made
for(i in 1:length(factor)){w[,i] <- w[,i] / factor[i]}
#We achieved our goal to have same means in the columns
colMeans(w)
height weight 
65     65

I can come up with the same thing easily ussing apply but is there something easier like just doing women/factor and get the correct answer? By the way, what does women/factor actually doing? as doing:

colMeans(women/factor)
height   weight  
49.08646 98.40094

Is not the same result.

Upvotes: 0

Views: 106

Answers (3)

akrun
akrun

Reputation: 886938

Also:

rowMeans(t(women)/factor)
#height weight 
#65     65 

Regarding your question:

I can come up with the same thing easily ussing apply but is there something easier like just doing women/factor and get the correct answer? By the way, what does women/factor actually doing?

women/factor ## is similar to

unlist(women)/rep(factor,nrow(women))

What you need is:

unlist(women)/rep(factor, each=nrow(women))

or

women/rep(factor, each=nrow(women))

In my solution, I didn't use rep because factor gets recycled as needed.

t(women) ##matrix

as.vector(t(women))/factor #will give same result as above

or just

t(women)/factor #preserve the dimensions for ?rowMeans

In short, column wise operations are happening here.

Upvotes: 1

David Arenburg
David Arenburg

Reputation: 92282

Can use mapply too

colMeans(mapply("/", w, factor))

Re your question re what does women/factor do, so women is a data.frame with two columns, while factor is numeric vector of length two. So when you do women/factor, R takes each entry of women (i.e. women[i,j]) and divides it once by factor[1] and then factor[2]. Because factor is shorter in length than women, R rolls factor over and over again. You can see, for example, that every second entry of women[, 1]/factor equals to every second entry of women[, 1] (because factor[1] equals to 1)

Upvotes: 1

Edwin
Edwin

Reputation: 3242

One way of doing this is using sweep. By default this function subtracts a summary statistic from each row, but you can also specify a different function to perform. In this case a division:

colMeans(sweep(women, 2, factor, '/'))

Upvotes: 1

Related Questions