DrDNA
DrDNA

Reputation: 11

R: Melting an data frame and plotting by group

I have a large dataset that I'd like to use to plot genetic divergence along chromosomes. The data frame I am using has the following format.

ID      Group   100     270     310     430     460     550     580     660     710     740
Strain1 A       0.191   0.147   0.124   0.149   0.193   0.189   0.123   0.189   0.151   0.180
Strain2 A       0.188   0.188   0.149   0.136   0.000   0.199   0.199   0.188   0.149   0.000
Strain3 B       0.123   0.147   0.190   0.061   0.148   0.149   0.148   0.197   0.178   0.172
Strain4 B       0.147   0.197   0.188   0.178   0.179   0.149   0.191   0.154   0.179   0.187

I'd like to use ggplot2 to plot a line for each strain, with the lines colored according to group affiliation, and a continuous x-axis running from chromosome positions 100 through 740. I cannot figure out how to melt the data without extracting the group info first and then adding it back after melting. Can anyone suggest a one-step approach to accomplish this?

Upvotes: 1

Views: 791

Answers (3)

DrDNA
DrDNA

Reputation: 11

The answer by akrun is almost there, except there should be one line plotted for each strain. For more information, here's a link to a screen shot (sorry, need more rep for posting actual image) of a SHINY app I'm working on that plots chromosome similarity between a selected fungal strain and a collection of other strains that infect different host grass species. Shiny App plot The current plot shows genetic divergence between strain 87-120 plotted against 10 rice (Oryza)-infecting strains (colored in red), 7 St. Augustinegrass (Stenotaphrum)-infecting strains (in dark blue) and 8 finger millet (Eleusine)-infecting strains (light blue). My current problem is that the x-axis values do not represent chromosome positions (instead it's the analysis window number) and I need to melt (or gather) data frame fields in a way that I can use the chromosome position information that is in the headers for the x-axis, and the Group information for the color.

Upvotes: 0

neilfws
neilfws

Reputation: 33782

I think this will work best if you colour by Group and facet on Strain. Assuming dataframe is named mydata:

library(tidyr)
library(ggplot2)

mydata %>% 
  gather(Var, Val, -Group, -ID) %>% 
  ggplot(aes(Var, Val)) + 
  geom_line(aes(color = Group, group = Group)) + 
  facet_grid(ID ~ .)

enter image description here

Upvotes: 1

akrun
akrun

Reputation: 887148

We could gather into 'long' format and then plot with ggplot

library(ggplot2)
library(dplyr)
library(tidyr)
gather(df1, key, val, 3:ncol(df1)) %>% 
   mutate(key = as.numeric(key)) %>%
   ggplot() + 
     geom_line(aes(x = key, y = val, group = Group, color = Group))

Upvotes: 1

Related Questions