Reputation: 137
I'm trying to optimize 2 nested for loops with GRanges in them.
My original GRanges is:
annot_cnv
GRanges object with 1733140 ranges and 3 metadata columns:
seqnames ranges strand | GENEID SAMPLE Segment_Mean
<Rle> <IRanges> <Rle> | <character> <character> <numeric>
chr1 [3301765, 44149504] + | 81569 TCGA-05-4433-01A-22D-1854-01 0.0889
chr1 [3301765, 44149504] + | 252995 TCGA-05-4433-01A-22D-1854-01 0.0889
chr1 [3301765, 44149504] + | 252995 TCGA-05-4433-01A-22D-1854-01 0.0889
chr1 [3301765, 44149504] + | 252995 TCGA-05-4433-01A-22D-1854-01 0.0889
chr1 [3301765, 44149504] + | 252995 TCGA-05-4433-01A-22D-1854-01 0.0889
And the 2 nested for loops are:
cnv_data <- data.frame()
for (i in unique(annot_cnv$SAMPLE))
{
sample_data <- annot_cnv[annot_cnv$SAMPLE == i,]
for (j in unique(sample_data$GENEID))
{
cnv_data[i,j] <- mean(sample_data$Segment_Mean[sample_data$GENEID == j])
}
}
I'm trying to use the function foreach but I don't know how to use it keeping the sample name as rownames and the GENEID as colnames in the final data.frame.
Can someone help me optimize these loops to do them in parallel?
Upvotes: 0
Views: 147
Reputation: 46876
The general strategy will be to make a GRangesList using splitByList()
, and then calling mean()
; it will calculate the mean of each element of the list. So along the lines of
grp = interaction(annot_cnv$GENE_ID, annot_cnv$SAMPLE)
grl = splitAsList(annot_cnv, grp)
mean(grl)
Better to ask questions about Bioconductor packages on the Biocondunctor support site.
Upvotes: 1
Reputation: 20095
You don't need for-loop
. One of many options available in r
using dplyr
package can be:
library(dplyr)
annot_cnv %>%
group_by(SAMPLE, GENEID) %>%
mutate(Avg = mean(Segment_Mean))
The Avg
column created above will contains your desired result.
Upvotes: 0