Reputation: 95
I have a simple data frame of 3 columns. Column 1 contains samples (A-D) collected over time. Column 2 identifies linked pairs within the data that represent interacting sites. There may be up to 10 identifier pairs in any one sample. Column 3 ranks (1-3) the strength of the interaction between the identifiers
The first task was to bin and graph the data. As the rank values are integers, I have used geom_jitter() to distribute them at alternate locations within the bin. I then assign shape/color to each unique identifier pair so that I can follow individual pairs between the samples (i.e. whether they increase or decrease in rank).
This is what I have done to date:
Sample <- c("A","A","A","B","B","C","C","D","D","D")
Rank <- c(3,3,1,3,3,2,3,3,2,1)
Site <- c(101202,102203,101201,102203,101202,101202,102203,102203,101201,101202)
DataSet <- as.data.frame(cbind(Sample,Rank,Site))
ggplot(data=DataSet, aes(x=Sample, y=Rank, group=factor(Site), colour = factor(Site))) + geom_jitter(aes(shape = factor(Site)), size = 4) + geom_vline(xintercept=seq(0.5,length(unique(DataSet$Sample)), 1), lwd=0.5, colour="black",linetype = "dotted") + geom_hline(yintercept=seq(0.5, length(unique(DataSet$Rank))+0.5, 1), lwd=0.5, colour="black",linetype = "dotted") + scale_shape_manual(values=c(16:18)) + theme(legend.position="none", panel.background = element_rect(fill = "white"))
My question is: To further aide visualization, I would like assign each unique identifier pair to the same location within the bin. Is there a way to achieve this?
To illustrate what I mean, please find the provided mocked-up figure.
So for example, the identifier pair in blue is ranked 1 in samples A to D and its position within the bin is the same (i.e. top left corner). The identifier pair in green changes rank but its location is to the bottom-right of the respective bin.
Upvotes: 2
Views: 61
Reputation: 13139
This might be a start. It is based on the idea that jitter is a random addition to the x and y coordinates of a point. So, if we add predetermined values to x and y, based on the site ID, points should appear in the same spot for the same sites.
First, we generate a dataframe of relative positions based on site-id.
position_data <- data.frame(id=1:10,
x=c(rep(seq(-0.4,0.4,length.out=3),3),0),
y = rep(seq(0.4,-0.4,length.out=4),c(3,3,3,1)))
We check wether this distribution is what we had in mind:
ggplot(position_data, aes(x=x, y=y)) + geom_text(aes(label=id))
Then, we merge this with the original dataset:
#generate identifier
DataSet$Site_ID <- as.numeric(DataSet$Site)
plotdata <- merge(DataSet, position_data, by.x="Site_ID",by.y="id")
Then we add the x-es and y-s to the Rank and Sample-positions:
plotdata$y <- plotdata$y + as.numeric(plotdata$Rank)
plotdata$x <- plotdata$x + as.numeric(plotdata$Sample)
Now we can plot:
ggplot(data=plotdata, aes(x=x, y=y, group=factor(Site), colour = factor(Site),shape=factor(Site))) +
geom_point(size=5)+
geom_vline(xintercept=seq(0.5,length(unique(DataSet$Sample)), 1), lwd=0.5, colour="black",linetype = "dotted") +
geom_hline(yintercept=seq(0.5, length(unique(DataSet$Rank))+0.5, 1), lwd=0.5, colour="black",linetype = "dotted") +
scale_shape_manual(values=c(16:18)) +
theme(legend.position="none", panel.background = element_rect(fill = "white")) +
scale_x_continuous(breaks=1:length(unique(plotdata$Sample)),labels=sort(unique(plotdata$Sample)),name="Sample") +
scale_y_continuous(breaks=1:length(unique(plotdata$Rank)), labels=sort(unique(plotdata$Rank)),name="Rank")
Note that our axes went from factor to numeric, so we had to add a custom scale to both.
Upvotes: 2