Dan_77
Dan_77

Reputation: 65

using apply to calculate across rows and columns

I have searched long and hard for a solution using apply but I am unable to find exactly what I need. I'm a new R user comming over from Excel and need to calculate the percent difference from an observation with a control. A realistic sample data frame looks like this:

site <- c(rep(1, 10), rep(2,10), rep(3,10))
element <-rep(c("ca", "Mg", "K"), 10)
control <- seq(from= 1,to=60, by=2)
BA01 <- seq(from= 31,to=90, by=2)
BA02 <- seq(from= 21,to=80, by=2)
BA03 <- seq(from= 101,to=160, by=2)
mydf <- data.frame(site, element, control, BA01, BA02,BA03)

where BA01 to BA03 are different test which will be compared to the control.

all I would like to do, is make a formula like this: ((BA01-control)/control)*100

and have it calculated for every test column(BA01-BA03) and every row in the data frame. In Excel I could just copy and paste the site and element columns plus the headers BA01-BA03 the type the formula in cell C2 and drag the formula to the right as far as needed then down as far as needed and have my results. In R I'm having difficulty getting the same results. I've tried apply already but can not get it to work. Basically, I would like to have Site and Element as columns 1 and 2, followed by the results from the formula with BA01, BA02 and BA03 as the column names. Probably it wouldn't make a difference, but my real data frame will have upwards of 130 columns and several thousand rows.

Does anyone have some tips for me?
Thank you very much in advance for your help.

Dan

Upvotes: 2

Views: 2623

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269852

Try this:

cbind(mydf[1:2], 100 *  mydf[4:6] / mydf$control - 100)

The first 5 lines of output are:

   site element       BA01       BA02       BA03
1     1      ca 3000.00000 2000.00000 10000.0000
2     1      Mg 1000.00000  666.66667  3333.3333
3     1       K  600.00000  400.00000  2000.0000
4     1      ca  428.57143  285.71429  1428.5714
5     1      Mg  333.33333  222.22222  1111.1111

Upvotes: 1

Ben Bolker
Ben Bolker

Reputation: 226557

How about:

pdiff <- function(x,y) (x-y)/y*100
BAcols <- subset(mydf,select=c(BA01,BA02,BA03))  

This subset is readable for a small data frame but if you really have lots of rows you want to normalize you will want to select these columns by using a numeric range, i.e. mydf[,-(1:3)] (drop the first three columns) or mydf[,4:ncol(mydf)] (keep columns 4 until the end).

cbind(mydf[,1:2],sweep(BAcols,1,mydf$control,pdiff))

or

 with(mydf,data.frame(site,element,sweep(BAcols,1,control,pdiff)))

Upvotes: 0

James
James

Reputation: 66844

If I understand correctly:

cbind(mydf[1:2],sapply(mydf[-(1:3)],function(x) 100*(x-mydf[[3]])/mydf[[3]]))
   site element       BA01       BA02       BA03
1     1      ca 3000.00000 2000.00000 10000.0000
2     1      Mg 1000.00000  666.66667  3333.3333
3     1       K  600.00000  400.00000  2000.0000
4     1      ca  428.57143  285.71429  1428.5714
5     1      Mg  333.33333  222.22222  1111.1111
...

Upvotes: 1

Related Questions