Reputation: 166
I apologise for what I know is probably a very basic question, but I don't know exactly what is the term I am looking for to search for an appropriate solution. If you could redirect me to another post, or help with my (convoluted) code, I'd really appreciate.
Essentially, I've got a giant table containing gene ontology results from 22 modules of co-expression. There are 22 levels of the "class" column, so I want to end up with 220 rows. I could not find an option to limit the output of Gene Ontology terms per module in the anRichment package to the top 10 results, so I am trying to filter this giant table manually, to output only the top 10 hits per module level (if there are 10).
dim(table.display)
[1] 2388 18
table.display[1:5,1:5]
class rank dataSetID dataSetName inGroups
1 black 1 GO:0007399 nervous system development GO|GO.BP
2 black 2 GO:0045202 synapse GO|GO.CC
3 black 3 GO:0050808 synapse organization GO|GO.BP
4 black 4 GO:0031175 neuron projection development GO|GO.BP
5 black 5 GO:0048812 neuron projection morphogenesis GO|GO.BP
table.display[2383:2388,1:5]
class rank dataSetID dataSetName inGroups
2383 yellow 54 GO:0048167 regulation of synaptic plasticity GO|GO.BP
2384 yellow 55 GO:0031226 intrinsic component of plasma membrane GO|GO.CC
2385 yellow 56 GO:0001505 regulation of neurotransmitter levels GO|GO.BP
2386 yellow 57 GO:0051960 regulation of nervous system development GO|GO.BP
2387 yellow 58 GO:0022857 transmembrane transporter activity GO|GO.MF
2388 yellow 59 GO:1903305 regulation of regulated secretory pathway GO|GO.BP
What I did was:
top10each_final <- list() # create a new list
for (module in all_modules) { # for clause to add the top 10 hits of each module to the empty list
top10each <- table.display[table.display$class==module,]
top10each_final[[module]] <- top10each[c(1:10),]
}
top10each_final_2 <- data.frame(matrix(unlist(top10each_final), nrow=length(top10each_final), byrow=T)) # convert the list to a table
The list looks right when I view it in Rstudio (list contains 22 sublists corresponding to each level of "class") but the conversion to a data frame is not working well. Instead of ending up with a top10each_final_2 table where elements of the list top10each_final are stacked vertically, I ended up with several repeating elements in a weird order.
top10each_final_2[1:5,1:5]
X1 X2 X3 X4 X5
1 black black black black black
2 blue blue blue blue blue
3 brown brown brown brown brown
4 cyan cyan cyan cyan cyan
5 darkgreen darkgreen darkgreen darkgreen darkgreen
My desired final output should look similar to the input, but just containing the top 10 hits per class level. E.g.:
desired_table.display[1:20,1:5]
class rank dataSetID dataSetName inGroups
1 black 1 GO:0007399 nervous system development GO|GO.BP
2 black 2 GO:0045202 synapse GO|GO.CC
3 black 3 GO:0050808 synapse organization GO|GO.BP
4 black 4 GO:0031175 neuron projection development GO|GO.BP
5 black 5 GO:0048812 neuron projection morphogenesis GO|GO.BP
6 black 6 GO:0098794 postsynapse GO|GO.CC
7 black 7 GO:0032501 multicellular organismal process GO|GO.BP
8 black 8 GO:0000902 cell morphogenesis GO|GO.BP
9 black 9 GO:0005891 voltage-gated calcium channel complex GO|GO.CC
10 black 10 GO:0048666 neuron development GO|GO.BP
11 blue 1 GO:0032501 multicellular organismal process GO|GO.BP
12 blue 2 GO:0009653 anatomical structure morphogenesis GO|GO.BP
13 blue 3 GO:0035295 tube development GO|GO.BP
14 blue 4 GO:0007275 multicellular organism development GO|GO.BP
15 blue 5 GO:0072359 circulatory system development GO|GO.BP
16 blue 6 GO:0035239 tube morphogenesis GO|GO.BP
17 blue 7 GO:0048856 anatomical structure development GO|GO.BP
18 blue 8 GO:0048646 anatomical structure formation involved in morphogenesis GO|GO.BP
19 blue 9 GO:0072358 cardiovascular system development GO|GO.BP
20 blue 10 GO:1990837 sequence-specific double-stranded DNA binding GO|GO.MF
Any ideas would be much appreciated!
Thanks!
Upvotes: 0
Views: 60
Reputation: 1495
What you need to do (to convert the list to a data frame) is
do.call(rbind, top10each_final)
That said, instead of the loop above, if you were to do the following:
top10each_final = table.display[table.display$rank %in% 1:10,]
that would give you what you need (I am filtering on any case where the rank is 1, 2, ..., 10
)
Upvotes: 1