Execute code on different subsets

Question

I have a data.frame with couple of thousands rows. I am applying several lines of code to subsets of this data.

I have 4 subsets in a column "mergeorder$phylum":

[1] "ascomycota"      "basidiomycota"   "unidentified"   
[4] "chytridiomycota"

And on every subset i have to apply this set of functions separately:

ascomycota<-mergeorder[mergeorder$phylum %in% c("ascomycota"), ]
group_ascomycota <- aggregate(ascomycota[,2:62], by=list(ascomycota$order), FUN=sum)

row.names(group_ascomycota)<-group_ascomycota[,1]
group_ascomycota$sum <-apply(group_ascomycota[,-1],1,sum) 

dat5 <-sweep(group_ascomycota[,2:62], 2, colSums(group_ascomycota[2:62]), '/')
dat5$sum <-apply(group_ascomycota[,-1],1,sum)
reorder_dat5 <- dat5[order(dat5$sum, decreasing=T),]

reorder_dat5$OTU_ID <- row.names(reorder_dat5)
FINITO<-reorder_dat5[1:15,]

write.table(FINITO, file="output_ITS1/ITS1_ascomycota_order_top15.csv", col.names=TRUE,row.names=FALSE, sep=",", quote=FALSE)

This code works. However, I would like to apply this code without manually replacing every "ascomycota" with "basidiomycota", "unidentified", "chytridiomycota".

What function should I use? How should I use it? I've been struggling with sapply(), repeat() but haven't come far.

The end result should execute the whole code and export csv separate files.

Many thanks for your answer

Mhairi McNeill · Accepted Answer

It's usually possible to write code that handles all subsets in one go. However, what you are doing is pretty complicated. The best thing to do might be to gather all that into a function and then just run the function for each subset. Something like this:

subset_transform <- function(subset){
  t <-mergeorder[mergeorder$phylum %in% c(subset), ]
  group_t <- aggregate(t[,2:62], by=list(t$order), FUN=sum)

  row.names(group_t)<-group_t[,1]
  group_t$sum <-apply(group_t[,-1],1,sum) 

  dat5 <-sweep(group_t[,2:62], 2, colSums(group_t[2:62]), '/')
  dat5$sum <-apply(group_t[,-1],1,sum)
  reorder_dat5 <- dat5[order(dat5$sum, decreasing=T),]

  reorder_dat5$OTU_ID <- row.names(reorder_dat5)
  FINITO<-reorder_dat5[1:15,]

  write.table(FINITO, file = paste("output_ITS1/ITS1_", subset, "_order_top15.csv"), col.names=TRUE,row.names=FALSE, sep=",", quote=FALSE)
}

subset_transform("ascomycota")
subset_transform("basidiomycota")
subset_transform("unidentified")
subset_transform("chytridiomycota")

Execute code on different subsets

Answers (1)

Related Questions