crazysantaclaus
crazysantaclaus

Reputation: 633

creating a factor-based in dendrogram with R and ggplot2

This is not so much a coding as general approach call for help ;-) I prepared a table containing taxonomic information about organisms. But I want to use the "names" of these organisms, so no values or anything where you could compute a distance or clustering with (this is also all the information I have). I just want to use these factors to create a plot that shows the relationship. My data looks like this:

    test2<-structure(list(genus = structure(c(4L, 2L, 7L, 8L, 6L, 1L, 3L, 
5L, 5L), .Label = c("Aminobacter", "Bradyrhizobium", "Hoeflea", 
"Hyphomonas", "Mesorhizobium", "Methylosinus", "Ochrobactrum", 
"uncultured"), class = "factor"), family = structure(c(4L, 1L, 
2L, 3L, 5L, 6L, 6L, 6L, 6L), .Label = c("Bradyrhizobiaceae", 
"Brucellaceae", "Hyphomicrobiaceae", "Hyphomonadaceae", "Methylocystaceae", 
"Phyllobacteriaceae"), class = "factor"), order = structure(c(1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Caulobacterales", 
"Rhizobiales"), class = "factor"), class = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Alphaproteobacteria", class = "factor"), 
    phylum = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Proteobacteria", class = "factor")), .Names = c("genus", 
"family", "order", "class", "phylum"), class = "data.frame", row.names = c(NA, 
9L))

is it necessary to set up artificial values to describe a distance between the levels?

Upvotes: 0

Views: 377

Answers (1)

missuse
missuse

Reputation: 19756

Here is an attempt using data.tree library

First create a string variable in the form: Proteobacteria/Alphaproteobacteria/Caulobacterales/Hyphomonadaceae/Hyphomonas

library(data.tree)
test2$pathString <- with(test2, 
                               paste(phylum,
                                     class,
                                     order,
                                     family,
                                     genus, sep = "/"))

tree_test2 = as.Node(test2)
plot(tree_test2)

enter image description here

many things can be done after like:

Interactive network:

library(networkD3)
test2_Network <- ToDataFrameNetwork(tree_test2, "name")
simpleNetwork(test2_Network)

enter image description here

or graph styled

library(igraph)
plot(as.igraph(tree_test2, directed = TRUE, direction = "climb"))

check out the vignette

enter image description here

using ggplot2:

library(ggraph)
graph = as.igraph(tree_test2, directed = TRUE, direction = "climb")

ggraph(graph, layout = 'kk') + 
  geom_node_text(aes(label = name))+
  geom_edge_link(arrow = arrow(type = "closed", ends = "first",
                               length = unit(0.20, "inches"),
                               angle = 15)) +
  geom_node_point() +
  theme_graph()+
  coord_cartesian(xlim = c(-3,3), expand = TRUE)

enter image description here

or perhaps:

ggraph(graph, layout = 'kk') + 
  geom_node_text(aes(label = name),   repel = T)+
  geom_edge_link(angle_calc = 'along',
                 end_cap = circle(3, 'mm'))+ 
  geom_node_point(size = 5) +
  theme_graph()+
  coord_cartesian(xlim = c(-3,3), expand = TRUE)

enter image description here

Upvotes: 4

Related Questions