Reputation: 53
I have a plot that looks like this :
This was generated using below code :
longData<-structure(list(Var1 = c(6L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 5L, 1L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 1L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L, 10L, 4L),
Var2 = 1:105, value = c(6.41613198900092, 2.84458852788571, 3.9878654949938, 2.84458852788571, 3.9878654949938, 2.84458852788571, 3.9878654949938, 2.84458852788571, 3.9878654949938, 2.84458852788571, 3.9878654949938, 2.84458852788571, 3.9878654949938, 2.84458852788571, 3.9878654949938, 2.84458852788571, 3.9878654949938, 2.84458852788571, 3.9878654949938, 2.84458852788571, 3.9878654949938, 2.84458852788571, 3.9878654949938, 2.84458852788571, 3.9878654949938, 2.84458852788571, 3.9878654949938, 3.67686146649183, 5.00283179158014, 3.67686146649183, 5.00283179158014, 3.67686146649183, 2.74160383824537, 3.67686146649183, 2.74160383824537, 3.67686146649183, 3.99931926480599, 3.67686146649183, 3.99931926480599, 3.67686146649183, 3.99931926480599, 3.67686146649183, 4.35362802335279, 3.67686146649183, 4.35362802335279, 3.67686146649183, 4.35362802335279, 3.67686146649183, 4.35362802335279, 3.99724328049621, 4.57754674528668, 4.35362802335279, 3.67686146649183, 4.00444938820912, 3.79529789699833, 4.00444938820912, 3.79529789699833, 4.00444938820912, 3.79529789699833, 4.00444938820912, 3.79529789699833, 4.00444938820912, 3.79529789699833, 4.00444938820912, 3.79529789699833, 4.00444938820912, 3.79529789699833, 3.81138375279554, 3.79529789699833, 3.81138375279554, 3.79529789699833, 6.47487593052109, 3.79529789699833, 6.47487593052109, 3.79529789699833, 5.20602718404916, 3.79529789699833, 5.20602718404916, 4.57754674528668, 5.20602718404916, 3.98269499936379, 4.87403303366088, 3.98269499936379, 4.87403303366088, 3.98269499936379, 4.87403303366088, 3.98269499936379, 4.87403303366088, 3.98269499936379, 4.87403303366088, 3.98269499936379, 4.87403303366088, 3.98269499936379, 4.36554132712456, 3.65333094050839, 4.36554132712456, 3.65333094050839, 4.36554132712456, 3.65333094050839, 4.36554132712456, 3.65333094050839, 4.36554132712456, 3.65333094050839, 4.36554132712456, 3.65333094050839)),
row.names = c(6L, 16L, 34L, 40L, 58L, 64L, 82L, 88L, 106L, 112L, 130L, 136L, 154L, 160L, 178L, 184L, 202L, 208L, 226L, 232L, 250L, 256L, 274L, 280L, 298L, 304L, 322L, 328L, 346L, 352L, 370L, 376L, 394L, 400L, 418L, 424L, 442L, 448L, 466L, 472L, 490L, 496L, 514L, 520L, 538L, 544L, 562L, 568L, 586L, 593L, 601L, 622L, 628L, 646L, 652L, 670L, 676L, 694L, 700L, 718L, 724L, 742L, 748L, 766L, 772L, 790L, 796L, 814L, 820L, 838L, 844L, 862L, 868L, 886L, 892L, 910L, 916L, 934L, 937L, 958L, 964L, 982L, 988L, 1006L, 1012L, 1030L, 1036L, 1054L, 1060L, 1078L, 1084L, 1102L, 1108L, 1126L, 1132L, 1150L, 1156L, 1174L, 1180L, 1198L, 1204L, 1222L, 1228L, 1246L, 1252L), class = "data.frame")
longData$value <-round(longData$value)
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
sc <- scale_fill_gradientn(colours = myPalette(7))
ggplot(data=longData,aes(x=Var2,y=Var1)) +
geom_path(linetype = "dashed") + geom_point(shape=21, size = 7, aes(fill = value)) +
sc+ scale_y_continuous(breaks=c(1:12),labels=c("Path1-49","Path2-49","Path3-49","CorrPath-49","Path5-49","UnkownPath-49","Path1-51","Path2-51","Path3-51","CorrPath-51","Path5-51","UnkownPath-51"))
Now I want to color the dashed lines like this :
if(color of current geom_point == color of next geom_point)
set the line color to the color of geom_point
else
set the line color to black
How can I do this ? Thanks in advance.
Upvotes: 0
Views: 368
Reputation: 13833
I have found an imperfect, yet workable solution. Thank you for sharing your dataset, yet as I pointed out in the comments, it did not have any points that would satisfy your criteria indicated in the original question. With that being said, I'll answer the question using a made up dataset similar to your own:
set.seed(54321)
df <- data.frame(
x=1:50,
y=sample(c('Path1', 'Path2', 'Path3'), 50, replace=TRUE),
value=as.character(sample(1:5, 50, replace=TRUE))
)
As you posed, you wanted a way of drawing a line through all your data. Points are colored according to a value, and the logic behind the color of the line is as follows:
For our purposes, df$x
will be the x axis and df$y
will be the y axis. I made df$y
discrete to match the OP's case. Critically: I have also made df$value
discrete. Since the OP is intending to use this to compare two points based on the logic above, it's important to force the comparison among discrete values or "binned" values rather than comparing two continuous values. This is due to unexpected results when comparing two doubles. As an example, 1.0000000000000001==1.00000000000000001
evaluates to be TRUE
in the console, even though it should be FALSE
, whereas both of those numbers would lie within a "bin" that was 0.999 to 1.001
.
Simple plot below. Goal is to change that dotted line according to above:
g <- ggplot(df, aes(x,y)) + theme_bw() +
scale_fill_manual(values = rainbow(5)) +
scale_color_manual(values = rainbow(5))
g + geom_path(group=1, color='gray50', linetype=2) +
geom_point(shape=21, size=4, aes(fill=value))
At first I thought we could just set the color=value
to control color and group=1
to control connectivity and we'd be all set... but that doesn't quite work properly:
g + geom_path(group=1, aes(color=value)) +
geom_point(shape=21, size=4, aes(fill=value))
The problem lies in that the color is always changing according to df$value
, where we want it to be black or gray when df$value
changes, and then be drawn again when df$value
is constant. In essence, color-changing was not the problem, it was connectivity. In this case, I wrote connect_check()
and used it to create another column in the dataset to control connectivity.
connect_check <- function(x) {
return_vector <- vector(length=length(x), mode='double')
grp_num <- 1
previous <- x[1]
for (i in 1:length(x)) {
if (x[i]==previous) {
return_vector[i] <- grp_num
}
else {
grp_num <- grp_num + 1
return_vector[i] <- grp_num
}
previous <- x[i]
}
return(return_vector)
}
# make a new column in the dataset
df$connected <- connect_check(df$value)
The result of connect_check()
is a vector that increments the value every time the value of that position in the vector changes. Here's a simple example:
> test <- c(1,2,2,4,7,5,5,5,2,2,3,8)
> test
[1] 1 2 2 4 7 5 5 5 2 2 3 8
> connect_check(test)
[1] 1 2 2 3 4 5 5 5 6 6 7 8
The final solution here is to use the newly-created df$connected
to control connectivity via the group=
aesthetic, and assign color=value
as before. The only problem is that ggplot
doesn't connect a line between a group of one point, so the kind of wonky workaround is that I'm using a geom_path
call before to draw a light gray dotted line through all the points... then overplotting the points based on df$connectivity
connection and their df$value
. In the end, it works. I think there might be a way if you use duplicated(df$value)
, but again... this works too. :)
g +
geom_path(linetype=2, color='gray50', group=1) +
geom_line(aes(color=value, group=connected), size=1) +
geom_point(shape=21, size=3, aes(fill=value))
Note: I made the size=
of the points smaller in the last plot so you can see the horizontal lines drawn where y remains constant and value either stays the same or changes.
Final point: in your own dataset, like I referenced, you could "bin" the data. I would go about that by making a separate column that assigns longData$value_bin
first (which could just be as simple as longData$value_bin <- round(longData$value, 1)
). You would then use df$value_bin
to compare the values of points to decide connectivity and color. If point fill=
is still set to value
, but line color=
is set to value_bin
, you may not have precisely the same color.
Upvotes: 1