Reputation: 7664
What I would like to do is:
a) have the plot produced by the ggplot
code be the same each time it runs [set.seed kind of notion?] and
b) have text labels jittered only for labels that have the same y-axis value -- leave the other text labels alone. This would seem to be some kind of conditional jittering based on a factor value for the points.
Here is some data:
dput(df)
structure(list(Firm = c("a verylongname", "b verylongname", "c verylongname",
"d verylongname", "e verylongname", "f verylongname", "g verylongname",
"h verylongname", "i verylongname", "j verylongname"), Sum = c(74,
77, 79, 82, 85, 85, 88, 90, 90, 92)), .Names = c("Firm", "Sum"
), row.names = c(NA, 10L), class = "data.frame")
Here is ggplot
code using df:
ggplot(df, aes(x = reorder(Firm, Sum, mean), y = Sum)) +
geom_text(aes(label = Firm), size = 3, show.guides = FALSE, position = position_jitter(height = .9)) +
theme(axis.text.x = element_blank()) +
scale_x_discrete(expand = c(-1.1, 0)) + # to show the lower left name fully
labs(x = "", y = "", title = "")
Notice one version of the plot still overlaps h and i -- each time I run the above code the locations of the text labels change.
BTW, this question conditional jitter shifts the discrete values on the x-axis a bit, but I would like to shift the overlapping points (only) on the y-axis.
Upvotes: 4
Views: 1848
Reputation: 93851
One option is to add a column to mark overlapping points and then plot those separately. A better option might be to directly shift the y-values of the overlapping points, so that we get direct control over their placement. I show both options below.
Option 1 (jitter): First, add a column to mark overlaps. In this case, because the points pretty much fall on a line, we can mark any points as overlapping if their y-values are too close. You can include more complex conditions if it's important to check whether the x-values are close as well.
df$overlap = lapply(1:nrow(df), function(i) {
if(min(abs(df[i, "Sum"] - df$Sum[-i])) <= 1) "Overlap" else "Ignore"
})
In the plot, I've colored the jittered points red so it's easy to tell which ones were affected.
# Add set.seed() here to make jitter reproducible
ggplot(df, aes(x = reorder(Firm, Sum, mean))) +
geom_text(data=df[df$overlap=="Overlap",],
aes(label = Firm, y = Sum), size = 3,
position = position_jitter(width=0, height = 1), colour="red") +
geom_text(data=df[df$overlap=="Ignore",],
aes(label = Firm, y = Sum), size = 3) +
theme(axis.text.x = element_blank()) +
scale_x_discrete(expand = c(-1.1, 0)) + # to show the lower left name fully
labs(x = "", y = "", title = "")
Option 2 (direct placement): Another option is to directly control how much the labels are shifted, rather than taking whatever jitter
happens to give us. In this case, we know that we want to shift each pair of points with the same y-value. More complex logic would be necessary in cases where we need to worry about both x and y values, more than two points in the same overlap, and/or where we need to shift values that are close, but not exactly the same.
library(dplyr)
# Create a new column that shifts pairs of points with the same y-value by +/- 0.25
df = df %>% group_by(Sum) %>%
mutate(SumNoOverlap = if(n()>1) Sum + c(-0.25,0.25) else Sum)
ggplot(df, aes(x = reorder(Firm, Sum, mean), y = SumNoOverlap)) +
geom_text(aes(label = Firm), size = 3) +
theme(axis.text.x = element_blank()) +
scale_x_discrete(expand = c(-1.1, 0)) + # to show the lower left name fully
labs(x = "", y = "", title = "")
Note: To make jitter reproducible, add set.seed(153)
(or whatever seed value you want) before the jittered plot code.
Upvotes: 4