Laure Tomás Daza
Laure Tomás Daza

Reputation: 137

Optimize 2 nested for loops with GRanges

I'm trying to optimize 2 nested for loops with GRanges in them.

My original GRanges is:

annot_cnv

GRanges object with 1733140 ranges and 3 metadata columns:
seqnames                ranges strand |      GENEID                       SAMPLE Segment_Mean
   <Rle>             <IRanges>  <Rle> | <character>                  <character>    <numeric>
    chr1   [3301765, 44149504]      + |       81569 TCGA-05-4433-01A-22D-1854-01       0.0889
    chr1   [3301765, 44149504]      + |      252995 TCGA-05-4433-01A-22D-1854-01       0.0889
    chr1   [3301765, 44149504]      + |      252995 TCGA-05-4433-01A-22D-1854-01       0.0889
    chr1   [3301765, 44149504]      + |      252995 TCGA-05-4433-01A-22D-1854-01       0.0889
    chr1   [3301765, 44149504]      + |      252995 TCGA-05-4433-01A-22D-1854-01       0.0889

And the 2 nested for loops are:

cnv_data <- data.frame()

for (i in unique(annot_cnv$SAMPLE)) 
{
 sample_data <- annot_cnv[annot_cnv$SAMPLE == i,]
 for (j in unique(sample_data$GENEID)) 
  {
   cnv_data[i,j] <- mean(sample_data$Segment_Mean[sample_data$GENEID == j])
  }
}

I'm trying to use the function foreach but I don't know how to use it keeping the sample name as rownames and the GENEID as colnames in the final data.frame.

Can someone help me optimize these loops to do them in parallel?

Upvotes: 0

Views: 147

Answers (2)

Martin Morgan
Martin Morgan

Reputation: 46876

The general strategy will be to make a GRangesList using splitByList(), and then calling mean(); it will calculate the mean of each element of the list. So along the lines of

grp = interaction(annot_cnv$GENE_ID, annot_cnv$SAMPLE)
grl = splitAsList(annot_cnv, grp)
mean(grl)

Better to ask questions about Bioconductor packages on the Biocondunctor support site.

Upvotes: 1

MKR
MKR

Reputation: 20095

You don't need for-loop. One of many options available in r using dplyrpackage can be:

library(dplyr)

annot_cnv %>%
  group_by(SAMPLE, GENEID) %>%
  mutate(Avg = mean(Segment_Mean))

The Avg column created above will contains your desired result.

Upvotes: 0

Related Questions