ggplot retain duplicate colors for combined plots

Question

I'm trying to create a map of all the ethnicities in the world - based on a SpatialPolygonsDataFrame (shape files can be downloaded here). My problem is that ggplot appears to reassign colors after each consecutive call to geom_polygon. The following code for two countries works fine and all the areas/ethnicities can be distinguished from each other.

library(rgeos)
library(maptools)
library(rms)
library(igraph)
library(foreign)
library(sp)
library(spdep)
library(ggplot2)

setwd("yourdirectory")

# load GREG dataset
greg <- readShapePoly("GREG.shp", proj4string=CRS("+proj=longlat +datum=WGS84")) 
# exclude very small polygons (<= 5 square km)
greg <- greg[greg$AREA > 1000e+06,]


dev.off()
temp <- greg[greg$COW==325,]
g<-ggplot(temp, aes(x = long, y = lat))  + 
   geom_polygon(data=temp,aes(group = group, fill=group, size=1))

temp <- greg[greg$COW==225,]
g + 
  geom_polygon(data=temp,aes(group = group, fill=group, size=1)) +
  theme(legend.position = "none")

However when I run this code in a loop and on a large number of polygons (countries in this case), the color of many polygons (check out Italy and Switzerland) become indistinguishable from each other, because ggplot assigns a unique color to each one (there are apparently 6011 polygons). is there a way to keep the "non-unique" colors of each polygon in the combined plot? In other words the plot should allow duplicate colors.

dev.off()
temp <- greg[greg$COW==0,]
g <- ggplot(temp, aes(x = long, y = lat)) + 
  geom_polygon(data=temp,aes(group = group,  fill=group, size=1))


for (cow in unique(greg$COW)) {
  if (cow==0) next
  temp <- greg[greg$COW==cow,]
  g <- g + 
    geom_polygon(data=temp, aes(group = group, fill=group, size=1))
}
g <- g + theme(legend.position = "none")

PS: you might have to export the second plot (ie. to PNG) in order to actually see it.

MrFlick · Accepted Answer

So, as I mentioned before, you can only have scale per attribute. So the fill colors don't reset for each country even if you add them as separate layers. In order to perform a coloring like that, you'll need to create your own variable that behaves in that manner. What i've done is used interaction() to find the unique combinations of country/ethnicity. Then, i took those values and mapped them to 1:12. I did that with

greg$ceid <- (as.numeric(interaction(greg$G1ID, greg$FIPS_CNTRY, drop=T)) %% 12) +1

Now this assumes that FIPS_CNTRY is a better measure of country than COW. It also appears that G1ID is a better ID for the particular ethnicity than GROUP1 across the dataset. If there is documentation for this data set, you'll probably want to carefully read it to verify this information. Most countries have less than 10 ethnicities, but there is one that has 206 and next highest is 87.

So this tried to spread out the colors across countries. The next trick is to use fortify explicitly to tell ggplot how to group the regions. We do that with

fortify(greg, region="ceid")

which produces something that looks like

       long      lat order  hole piece group id
1 -158.7752 63.22207     1 FALSE     1   1.1  1
2 -158.7752 63.36345     2 FALSE     1   1.1  1
3 -158.4783 63.54724     3 FALSE     1   1.1  1
4 -158.4359 63.64621     4 FALSE     1   1.1  1
5 -158.3228 63.83000     5 FALSE     1   1.1  1
6 -158.0262 63.98471     6 FALSE     1   1.1  1

where the group indicates the polygon grouping and the id corresponds to the regions we specified in the fortify. So these are the numbers 1:12. Now we plot this all with

g <- ggplot(fortify(greg, region="ceid"), aes(x = long, y = lat)) + 
  geom_polygon(aes(group = group,  fill = id), size=1) + 
  scale_fill_brewer(type="qual", palette = "Set3") + 
  theme(legend.position = "none")

Here I used a colorbrewer qualitative color pallete. That looks like this

enter image description here

If you instead plotted with the actual ethnicities ids for group 1 with the default colors, you could get

g <- ggplot(fortify(greg, region="G1ID"), aes(x = long, y = lat)) + 
  geom_polygon(aes(group = group,  fill=id), size=1) + 
  theme(legend.position = "none")

enter image description here

The latter plot is certainly "smoother", but it's really up to you what you want to communicate though the plot.

ggplot retain duplicate colors for combined plots

Answers (1)

Related Questions