Reputation: 1809
I'm trying to create a map of all the ethnicities in the world - based on a SpatialPolygonsDataFrame
(shape files can be downloaded here). My problem is that ggplot
appears to reassign colors after each consecutive call to geom_polygon
. The following code for two countries works fine and all the areas/ethnicities can be distinguished from each other.
library(rgeos)
library(maptools)
library(rms)
library(igraph)
library(foreign)
library(sp)
library(spdep)
library(ggplot2)
setwd("yourdirectory")
# load GREG dataset
greg <- readShapePoly("GREG.shp", proj4string=CRS("+proj=longlat +datum=WGS84"))
# exclude very small polygons (<= 5 square km)
greg <- greg[greg$AREA > 1000e+06,]
dev.off()
temp <- greg[greg$COW==325,]
g<-ggplot(temp, aes(x = long, y = lat)) +
geom_polygon(data=temp,aes(group = group, fill=group, size=1))
temp <- greg[greg$COW==225,]
g +
geom_polygon(data=temp,aes(group = group, fill=group, size=1)) +
theme(legend.position = "none")
However when I run this code in a loop and on a large number of polygons (countries in this case), the color of many polygons (check out Italy and Switzerland) become indistinguishable from each other, because ggplot assigns a unique color to each one (there are apparently 6011 polygons). is there a way to keep the "non-unique" colors of each polygon in the combined plot? In other words the plot should allow duplicate colors.
dev.off()
temp <- greg[greg$COW==0,]
g <- ggplot(temp, aes(x = long, y = lat)) +
geom_polygon(data=temp,aes(group = group, fill=group, size=1))
for (cow in unique(greg$COW)) {
if (cow==0) next
temp <- greg[greg$COW==cow,]
g <- g +
geom_polygon(data=temp, aes(group = group, fill=group, size=1))
}
g <- g + theme(legend.position = "none")
PS: you might have to export the second plot (ie. to PNG) in order to actually see it.
Upvotes: 3
Views: 570
Reputation: 206496
So, as I mentioned before, you can only have scale per attribute. So the fill colors don't reset for each country even if you add them as separate layers. In order to perform a coloring like that, you'll need to create your own variable that behaves in that manner. What i've done is used interaction()
to find the unique combinations of country/ethnicity. Then, i took those values and mapped them to 1:12. I did that with
greg$ceid <- (as.numeric(interaction(greg$G1ID, greg$FIPS_CNTRY, drop=T)) %% 12) +1
Now this assumes that FIPS_CNTRY
is a better measure of country than COW
. It also appears that G1ID
is a better ID for the particular ethnicity than GROUP1
across the dataset. If there is documentation for this data set, you'll probably want to carefully read it to verify this information. Most countries have less than 10 ethnicities, but there is one that has 206 and next highest is 87.
So this tried to spread out the colors across countries. The next trick is to use fortify
explicitly to tell ggplot how to group the regions. We do that with
fortify(greg, region="ceid")
which produces something that looks like
long lat order hole piece group id
1 -158.7752 63.22207 1 FALSE 1 1.1 1
2 -158.7752 63.36345 2 FALSE 1 1.1 1
3 -158.4783 63.54724 3 FALSE 1 1.1 1
4 -158.4359 63.64621 4 FALSE 1 1.1 1
5 -158.3228 63.83000 5 FALSE 1 1.1 1
6 -158.0262 63.98471 6 FALSE 1 1.1 1
where the group
indicates the polygon grouping and the id
corresponds to the regions we specified in the fortify
. So these are the numbers 1:12. Now we plot this all with
g <- ggplot(fortify(greg, region="ceid"), aes(x = long, y = lat)) +
geom_polygon(aes(group = group, fill = id), size=1) +
scale_fill_brewer(type="qual", palette = "Set3") +
theme(legend.position = "none")
Here I used a colorbrewer qualitative color pallete. That looks like this
If you instead plotted with the actual ethnicities ids for group 1 with the default colors, you could get
g <- ggplot(fortify(greg, region="G1ID"), aes(x = long, y = lat)) +
geom_polygon(aes(group = group, fill=id), size=1) +
theme(legend.position = "none")
The latter plot is certainly "smoother", but it's really up to you what you want to communicate though the plot.
Upvotes: 5