Empiromancer
Empiromancer

Reputation: 3854

ggplot, Label small east coast states on US choropleth with floating colored markers

I'm trying to create a US choropleth and would like to add labels for the small east coast states, like in this example:

example map, showing the value for the small east coast states as labelled dots in the atlantic

The code I've been using for my map is

gg <- ggplot()
gg <- gg + geom_map(data = usfix, map = usfix, 
                    aes(x=long, y=lat, map_id=id), 
                    fill="#ffffff", color="#ffffff", size=0.15)
gg <- gg + geom_map(data = state.avg, map=usfix, 
                    aes(fill = score, map_id=id), 
                    color="#ffffff", size=0.15)
gg <- gg + labs(x=NULL, y=NULL)
gg <- gg + theme(panel.border = element_blank())
gg <- gg + theme(panel.background = element_blank())
gg <- gg + theme(axis.ticks = element_blank())
gg <- gg + theme(axis.text = element_blank())
gg <- gg + theme(panel.grid = element_blank())
gg <- gg + theme(legend.position = c(.15,.7))
gg <- gg + coord_fixed()
gg

I've attempted to add the labels as a new point layer

gg <- gg + geom_point(aes(x=2500000,y=(-1:4)*(-100000), color = percentile), 
                      data=state.avg[state.avg$STATE_CODE %in% 
                      c("MA", "RI", "CT", "NJ", "DE", "MD"),])
gg <- gg + geom_text(data = state.avg[state.avg$STATE_CODE %in% 
                     c("MA", "RI", "CT", "NJ", "DE", "MD"),], 
                     aes(label = STATE_CODE, x=2600000,y=(-1:4)*(-100000)),
                     hjust=0, size = 4)

but this produces a second legend, and also puts the colors of the state label dots on a slightly different scale from the color of the states themselves (which would be even worse if percentile for these six states didn't span a good chunk of the range of percentile across the country - and I have no guarantee that it will for every map I want to make). enter image description here

Is there a less ad hoc way of making labels like this, that wouldn't spawn a second legend and use it's own color range? Or, if not, is there a way to remove the second legend and bind the label colors to the colors of the states on the map?

EDIT: I've figured out how to remove the second legend (gg <- gg + guides(color=FALSE) does the trick), but still have the problem that the color scale of the map and the color scale of the points are different.

Upvotes: 2

Views: 1486

Answers (1)

Empiromancer
Empiromancer

Reputation: 3854

Following hrbrmster's excellent suggestion, I solved the problem using ggplot_build() as follows:

Making use of str {utils} to navigate my way through the ggplot_build(gg) object, I find that the colors for each state are in ggplot_build(gg)$data[[2]]$fill, and the state labels are in ggplot_build(gg)$data[[2]]$map_id. Thus, I can get the color for Utah using

ggplot_build(gg)$data[[2]]$fill[which(ggplot_build(gg)$data[[2]]$map_id == 'utah')]

Thus, I can color the points for the east coast states with

state.getcolor <- function(x) {
  ggplot_build(gg)$data[[2]]$fill[which(ggplot_build(gg)$data[[2]]$map_id == x)]
}
eastcoast.colors <- sapply(c("massachusetts", "rhode island", "connecticut", "new jersey", 
                           "delaware", "maryland", "district of columbia"), state.getcolor)

And pass it to the geom_point layer

gg <- gg + geom_point(aes(x=2500000, y=(-1:5)*(-100000)), color = eastcoast.colors, 
                      data = state.avg[state.avg$STATE_CODE %in% 
                      c("MA", "RI", "CT", "NJ", "DE", "MD", "DC"),])
gg <- gg + geom_text(aes(label = c("MA", "RI", "CT", "NJ", "DE", "MD", "DC"), 
                     x=2600000, y=(-1:5)*(-100000)), hjust=0, size=3, 
                     data = state.avg[state.avg$STATE_CODE %in% 
                     c("MA", "RI", "CT", "NJ", "DE", "MD", "DC"),])

I'm still not entirely sure why it fails with error "Error in data.frame(x = 2500000, y = c(1e+06, 0, -1e+06, -2e+06, -3e+06, : arguments imply differing number of rows: 1, 7, 51" when I try to remove the data specification in geom_point and geom_text. I assume the error means that in the absence of specified data, it tries to use the data for the map plot, but I'm not asking it to do anything with data - just to plot points. So there's probably still a gap in my understanding of ggplot.

Upvotes: 1

Related Questions