Reputation: 65
I have searched long and hard for a solution using apply but I am unable to find exactly what I need. I'm a new R user comming over from Excel and need to calculate the percent difference from an observation with a control. A realistic sample data frame looks like this:
site <- c(rep(1, 10), rep(2,10), rep(3,10))
element <-rep(c("ca", "Mg", "K"), 10)
control <- seq(from= 1,to=60, by=2)
BA01 <- seq(from= 31,to=90, by=2)
BA02 <- seq(from= 21,to=80, by=2)
BA03 <- seq(from= 101,to=160, by=2)
mydf <- data.frame(site, element, control, BA01, BA02,BA03)
where BA01 to BA03 are different test which will be compared to the control.
all I would like to do, is make a formula like this: ((BA01-control)/control)*100
and have it calculated for every test column(BA01-BA03) and every row in the data frame. In Excel I could just copy and paste the site and element columns plus the headers BA01-BA03 the type the formula in cell C2 and drag the formula to the right as far as needed then down as far as needed and have my results. In R I'm having difficulty getting the same results. I've tried apply already but can not get it to work. Basically, I would like to have Site and Element as columns 1 and 2, followed by the results from the formula with BA01, BA02 and BA03 as the column names. Probably it wouldn't make a difference, but my real data frame will have upwards of 130 columns and several thousand rows.
Does anyone have some tips for me?
Thank you very much in advance for your help.
Dan
Upvotes: 2
Views: 2623
Reputation: 269852
Try this:
cbind(mydf[1:2], 100 * mydf[4:6] / mydf$control - 100)
The first 5 lines of output are:
site element BA01 BA02 BA03
1 1 ca 3000.00000 2000.00000 10000.0000
2 1 Mg 1000.00000 666.66667 3333.3333
3 1 K 600.00000 400.00000 2000.0000
4 1 ca 428.57143 285.71429 1428.5714
5 1 Mg 333.33333 222.22222 1111.1111
Upvotes: 1
Reputation: 226557
How about:
pdiff <- function(x,y) (x-y)/y*100
BAcols <- subset(mydf,select=c(BA01,BA02,BA03))
This subset
is readable for a small data frame but if you really have lots of rows you want to normalize you will want to select these columns by using a numeric range, i.e. mydf[,-(1:3)]
(drop the first three columns) or mydf[,4:ncol(mydf)]
(keep columns 4 until the end).
cbind(mydf[,1:2],sweep(BAcols,1,mydf$control,pdiff))
or
with(mydf,data.frame(site,element,sweep(BAcols,1,control,pdiff)))
Upvotes: 0
Reputation: 66844
If I understand correctly:
cbind(mydf[1:2],sapply(mydf[-(1:3)],function(x) 100*(x-mydf[[3]])/mydf[[3]]))
site element BA01 BA02 BA03
1 1 ca 3000.00000 2000.00000 10000.0000
2 1 Mg 1000.00000 666.66667 3333.3333
3 1 K 600.00000 400.00000 2000.0000
4 1 ca 428.57143 285.71429 1428.5714
5 1 Mg 333.33333 222.22222 1111.1111
...
Upvotes: 1