Manipulating the location of data points in a plot

Question

I have a simple data frame of 3 columns. Column 1 contains samples (A-D) collected over time. Column 2 identifies linked pairs within the data that represent interacting sites. There may be up to 10 identifier pairs in any one sample. Column 3 ranks (1-3) the strength of the interaction between the identifiers

The first task was to bin and graph the data. As the rank values are integers, I have used geom_jitter() to distribute them at alternate locations within the bin. I then assign shape/color to each unique identifier pair so that I can follow individual pairs between the samples (i.e. whether they increase or decrease in rank).

This is what I have done to date:

Sample <- c("A","A","A","B","B","C","C","D","D","D")
Rank <- c(3,3,1,3,3,2,3,3,2,1)
Site <- c(101202,102203,101201,102203,101202,101202,102203,102203,101201,101202)
DataSet <- as.data.frame(cbind(Sample,Rank,Site))
ggplot(data=DataSet, aes(x=Sample, y=Rank, group=factor(Site), colour = factor(Site))) + geom_jitter(aes(shape = factor(Site)),  size = 4) + geom_vline(xintercept=seq(0.5,length(unique(DataSet$Sample)), 1), lwd=0.5, colour="black",linetype = "dotted") + geom_hline(yintercept=seq(0.5, length(unique(DataSet$Rank))+0.5, 1), lwd=0.5, colour="black",linetype = "dotted") + scale_shape_manual(values=c(16:18)) + theme(legend.position="none", panel.background = element_rect(fill = "white"))

My question is: To further aide visualization, I would like assign each unique identifier pair to the same location within the bin. Is there a way to achieve this?

To illustrate what I mean, please find the provided mocked-up figure.

So for example, the identifier pair in blue is ranked 1 in samples A to D and its position within the bin is the same (i.e. top left corner). The identifier pair in green changes rank but its location is to the bottom-right of the respective bin.

Heroka · Accepted Answer

This might be a start. It is based on the idea that jitter is a random addition to the x and y coordinates of a point. So, if we add predetermined values to x and y, based on the site ID, points should appear in the same spot for the same sites.

First, we generate a dataframe of relative positions based on site-id.

position_data <- data.frame(id=1:10, 
                            x=c(rep(seq(-0.4,0.4,length.out=3),3),0),
                            y = rep(seq(0.4,-0.4,length.out=4),c(3,3,3,1)))

We check wether this distribution is what we had in mind:

 ggplot(position_data, aes(x=x, y=y)) + geom_text(aes(label=id))

Then, we merge this with the original dataset:

#generate identifier
DataSet$Site_ID <- as.numeric(DataSet$Site)

plotdata <- merge(DataSet, position_data, by.x="Site_ID",by.y="id")

Then we add the x-es and y-s to the Rank and Sample-positions:

plotdata$y <- plotdata$y + as.numeric(plotdata$Rank)
plotdata$x <- plotdata$x + as.numeric(plotdata$Sample)

Now we can plot:

ggplot(data=plotdata, aes(x=x, y=y, group=factor(Site), colour = factor(Site),shape=factor(Site))) +
  geom_point(size=5)+
  geom_vline(xintercept=seq(0.5,length(unique(DataSet$Sample)), 1), lwd=0.5, colour="black",linetype = "dotted") + 
  geom_hline(yintercept=seq(0.5, length(unique(DataSet$Rank))+0.5, 1), lwd=0.5, colour="black",linetype = "dotted") + 
  scale_shape_manual(values=c(16:18)) + 
  theme(legend.position="none", panel.background = element_rect(fill = "white")) +
  scale_x_continuous(breaks=1:length(unique(plotdata$Sample)),labels=sort(unique(plotdata$Sample)),name="Sample") +
  scale_y_continuous(breaks=1:length(unique(plotdata$Rank)), labels=sort(unique(plotdata$Rank)),name="Rank")

Note that our axes went from factor to numeric, so we had to add a custom scale to both.

Result:

Manipulating the location of data points in a plot

Answers (1)

Related Questions