Rodrigo Duarte
Rodrigo Duarte

Reputation: 166

Transforming list to data frame in R

I apologise for what I know is probably a very basic question, but I don't know exactly what is the term I am looking for to search for an appropriate solution. If you could redirect me to another post, or help with my (convoluted) code, I'd really appreciate.

Essentially, I've got a giant table containing gene ontology results from 22 modules of co-expression. There are 22 levels of the "class" column, so I want to end up with 220 rows. I could not find an option to limit the output of Gene Ontology terms per module in the anRichment package to the top 10 results, so I am trying to filter this giant table manually, to output only the top 10 hits per module level (if there are 10).

dim(table.display)
[1] 2388   18

table.display[1:5,1:5]
  class rank  dataSetID                     dataSetName inGroups
1 black    1 GO:0007399      nervous system development GO|GO.BP
2 black    2 GO:0045202                         synapse GO|GO.CC
3 black    3 GO:0050808            synapse organization GO|GO.BP
4 black    4 GO:0031175   neuron projection development GO|GO.BP
5 black    5 GO:0048812 neuron projection morphogenesis GO|GO.BP

table.display[2383:2388,1:5]
      class rank  dataSetID                               dataSetName inGroups
2383 yellow   54 GO:0048167         regulation of synaptic plasticity GO|GO.BP
2384 yellow   55 GO:0031226    intrinsic component of plasma membrane GO|GO.CC
2385 yellow   56 GO:0001505     regulation of neurotransmitter levels GO|GO.BP
2386 yellow   57 GO:0051960  regulation of nervous system development GO|GO.BP
2387 yellow   58 GO:0022857        transmembrane transporter activity GO|GO.MF
2388 yellow   59 GO:1903305 regulation of regulated secretory pathway GO|GO.BP

What I did was:

top10each_final <- list()      # create a new list
for (module in all_modules) {  # for clause to add the top 10 hits of each module to the empty list
  top10each <- table.display[table.display$class==module,]
  top10each_final[[module]] <- top10each[c(1:10),]
} 
top10each_final_2 <- data.frame(matrix(unlist(top10each_final), nrow=length(top10each_final), byrow=T)) # convert the list to a table

The list looks right when I view it in Rstudio (list contains 22 sublists corresponding to each level of "class") but the conversion to a data frame is not working well. Instead of ending up with a top10each_final_2 table where elements of the list top10each_final are stacked vertically, I ended up with several repeating elements in a weird order.

top10each_final_2[1:5,1:5]
         X1        X2        X3        X4        X5
1     black     black     black     black     black
2      blue      blue      blue      blue      blue
3     brown     brown     brown     brown     brown
4      cyan      cyan      cyan      cyan      cyan
5 darkgreen darkgreen darkgreen darkgreen darkgreen

My desired final output should look similar to the input, but just containing the top 10 hits per class level. E.g.:

desired_table.display[1:20,1:5]
      class rank  dataSetID                     dataSetName inGroups
1   black   1   GO:0007399  nervous system development  GO|GO.BP
2   black   2   GO:0045202  synapse GO|GO.CC
3   black   3   GO:0050808  synapse organization    GO|GO.BP
4   black   4   GO:0031175  neuron projection development   GO|GO.BP
5   black   5   GO:0048812  neuron projection morphogenesis GO|GO.BP
6   black   6   GO:0098794  postsynapse GO|GO.CC
7   black   7   GO:0032501  multicellular organismal process    GO|GO.BP
8   black   8   GO:0000902  cell morphogenesis  GO|GO.BP
9   black   9   GO:0005891  voltage-gated calcium channel complex   GO|GO.CC
10  black   10  GO:0048666  neuron development  GO|GO.BP
11  blue    1   GO:0032501  multicellular organismal process    GO|GO.BP
12  blue    2   GO:0009653  anatomical structure morphogenesis  GO|GO.BP
13  blue    3   GO:0035295  tube development    GO|GO.BP
14  blue    4   GO:0007275  multicellular organism development  GO|GO.BP
15  blue    5   GO:0072359  circulatory system development  GO|GO.BP
16  blue    6   GO:0035239  tube morphogenesis  GO|GO.BP
17  blue    7   GO:0048856  anatomical structure development    GO|GO.BP
18  blue    8   GO:0048646  anatomical structure formation involved in morphogenesis    GO|GO.BP
19  blue    9   GO:0072358  cardiovascular system development   GO|GO.BP
20  blue    10  GO:1990837  sequence-specific double-stranded DNA binding   GO|GO.MF

Any ideas would be much appreciated!

Thanks!

Upvotes: 0

Views: 60

Answers (1)

jav
jav

Reputation: 1495

What you need to do (to convert the list to a data frame) is

do.call(rbind, top10each_final)

That said, instead of the loop above, if you were to do the following:

top10each_final = table.display[table.display$rank %in% 1:10,]

that would give you what you need (I am filtering on any case where the rank is 1, 2, ..., 10)

Upvotes: 1

Related Questions