RNovice
RNovice

Reputation: 41

Incorrect plotting of data on Choropleth map with ggplot2 in R

I am trying to make a Choropleth map of Germany using ggplot2. My data is a .csv file with 2 rows (RS= containing numbers 1 to 16 for each German state & Tariff= 16 random positive and negative numbers).

    RS  Tariff
1   01  -5.25
2   02  7.16
3   03  6.65
4   04  3.10
5   05  3.69
6   06  2.49
7   07  1.89
8   08  3.93
9   09  -5.84
10  10  -2.61
11  11  -0.21
12  12  2.35
13  13  -5.94
14  14  -7.54
15  15  -3.27
16  16  -8.75

Also I have a shape file Germany shape file. What I want to do is to plot this positive and negative numbers onto map of germany for each state, with 2 colors (positive=Green and Negative=Red). Below is my code

library(XLConnect)
library(sp)
library(rgdal)
library(ggplot2)
library(plyr)
library(RColorBrewer)
library(DataCombine)
library(rgeos)
library(maptools)

#### EEG Data Read ####
eeg<-read.csv(file = "data/testdata1.csv", skip = 0, sep = ",", dec=".",       header=TRUE)
colnames(eeg)<-c("RS", "Tariff")
eeg$RS<-   c("01","02","03","04","05","06","07","08","09","10","11","12","13","14","15","16")
eeg$RS<-as.factor(eeg$RS)
eeg$Tariff<- as.numeric(eeg$Tariff)

#### Shape Data Read ####
bundesl<-readOGR("data/new_shape/vg2500_bld.shp", "vg2500_bld")
bundesl@data<- bundesl@data[order(bundesl$RS, na.last=NA),]

### Rearrange shape data for better merging ###
levels(bundesl$GEN)<- c("Schleswig-Holstein", "Mecklenburg-Vorpommern",   "Hamburg", "Bremen", "Niedersachsen", "Sachsen-Anhalt", "Brandenburg", 
                    "Berlin", "Nordrhein-Westfalen", "Hessen","Thüringen","Sachsen", "Rheinland-Pfalz", "Saarland", "Baden-  Württemberg", "Bayern")
bundesl$GEN<- c("Schleswig-Holstein", "Mecklenburg-Vorpommern", "Hamburg", "Bremen", "Niedersachsen", "Sachsen-Anhalt", "Brandenburg", 
            "Berlin", "Nordrhein-Westfalen", "Hessen","Thüringen","Sachsen",  "Rheinland-Pfalz", "Saarland", "Baden-Württemberg", "Bayern")
bundesl$SHAPE_LENG<-  c("1217255.7","1780980.8","175253.8","154971.6","2016496.4","949096.8",
                   "1295460.4","180751.2","1352108","1105092.8","961942.7","979294.3","910650.4",
                   "282910.8","1298891.7","2046039.3")
bundesl$SHAPE_AREA<- c("15857425536","23044684847","760539820","405480872","47716406483","20494982327","29653902483","886480139","34047269991","21092318103","16178531941","18401642456","19834907486","2578541706","35801397076","70550070623")

# #### Shape Data und EEG Data join ####
bundesl@data<-merge(bundesl@data, eeg, by="RS",  all=TRUE)

# #### Shapes Plot ####
bundesl@data$id <- (as.numeric(rownames(bundesl@data))-1)
bundesl.df<-fortify(bundesland)
bundesl.df <- join(bundesl.df, bundesl@data, by="id")


 ggp <- ggplot(data=bundesl.df, aes(x=long, y=lat, group=group))
 ggp <- ggp + geom_polygon(aes(fill=Tariff), col="black") 
 ggp <- ggp + coord_map() 
 ggp <- ggp + scale_fill_continuous(name=expression(Tariff), low = "red", high = "green", space = "Lab", na.value = "white", guide = "colourbar")
ggp <- ggp + theme_minimal()
ggp <- ggp + theme(axis.title=element_blank(), axis.ticks=element_blank(),   axis.text=element_blank())
 ggp

So far I manage to plot the map, but with wrong data mapping. I mean state with positive Tariff like Schleswig-Holstein should be green but are red and Bavaria should be red but is green.

My guess is that there is problem with fortify function. My data is only 16 rows but after fortify it print 1000+ rows. Why?? and this is something which is causing mismatching of data. I did all the search on internet I possibly can for the solution. I would appreciate if anybody can give me an answer as to why this problem is occurring.

Thank you for your help in advance!

Upvotes: 0

Views: 1545

Answers (2)

MrLoh
MrLoh

Reputation: 466

This is a bit old, but since it is the best question about germany choropleth maps, I wanted to add a few things i learned while following @hrbrmstr's great answer.

As you can see in the map he plotted, Berlin is just getting the color of Brandenburg. To fix this, the order in the bundesl_map has to be edited, to make sure the Berlin (10) comes after Brandenburg (11). So the full processing of the map should look like this:

library(rgdal)
library(rgeos)
library(maptools)

bundesl <- readOGR("vg2500_geo84/vg2500_bld.shp", "vg2500_bld")
bundesl@data<- bundesl@data[order(bundesl$RS, na.last=NA),]
bundesl_map <- fortify(bundesl, region="RS")
bundesl_map <- rbind(
    bundesl_map[bundesl_map$id != 10, ], 
    bundesl_map[bundesl_map$id == 10, ]
)
saveRDS(bundesl_map, "bundesland")

In the last step we save the map for future use (bundesl_map <- readRDS("budesland")). Here is a copy of the file I created with named ids.

The plotting can also be made a bit more concise as follows:

library(magrittr)
library(ggplot2)
library(viridis)

egg %>% ggplot(aes(fill=Tariff, map_id=RS)) +
    geom_map(map=bundesl_map, color="white", size=0.2) + 
    geom_text(aes(x=x, y=y, label=egg$RS), size=2, color="white") +
    coord_map("mercator") +
    expand_limits(x=bundesl_map$long, y=bundesl_map$lat) +
    scale_fill_viridis(begin=0.4, end=0.9, breaks=-8:7, guide=guide_legend(reverse=T)) + 
    theme_map(base_size=8)

where theme_map is defined as:

theme_map <- function(...) {
    theme_classic(...) %+replace% 
    theme(
        axis.ticks = element_blank(), 
        axis.text = element_blank(),
        axis.title = element_blank(), 
        line=element_blank()
    )
}

This will yield a map like this:

map

Upvotes: 0

hrbrmstr
hrbrmstr

Reputation: 78792

fortify puts the polygons from the shapefile into something ggplot can plot, hence the 1,000+ rows. While you can attach values to the fortified polygons, it's not necessary.

So, you don't really have to go through all that trouble for the choropleth. Take a look at the following. I added in some extra bits to show which values get mapped to which RS:

library(rgdal)
library(ggplot2)
library(gridExtra)

egg <- read.table(text="RS  Tariff
01  -5.25
02  7.16
03  6.65
04  3.10
05  3.69
06  2.49
07  1.89
08  3.93
09  -5.84
10  -2.61
11  -0.21
12  2.35
13  -5.94
14  -7.54
15  -3.27
16  -8.75", header=TRUE, colClasses=c("character", "numeric"))

bundesl <- readOGR("vg2500_geo84/vg2500_bld.shp", "vg2500_bld")
bundesl@data<- bundesl@data[order(bundesl$RS, na.last=NA),]

# good projection for germany but if you intende to draw additional
# lines or points you'll have to project them before plotting so this
# may be more trouble than it's worth and you can just use 
# coord_map("mollweide") or something else that works for you besides mercator

bundesl <- spTransform(bundesl, CRS("+proj=utm +zone=33 +ellps=WGS84 +datum=WGS84 +units=m +no_defs "))

bundesl_map <- fortify(bundesl, region="RS")

# only doing this bit to plot the RS # at the center of each polygon
# totally not necessary for the choropleth

egg <- cbind(egg, data.frame(gCentroid(bundesl, byid=TRUE)))

gg <- ggplot()

# this bit ensures you have the outlines

gg <- gg + geom_map(data=bundesl_map, map=bundesl_map,
                    aes(x=long, y=lat, map_id=id),
                    color="#7f7f7f", size=0.15)

# this bit here does your choropleth

gg <- gg + geom_map(data=egg, map=bundesl_map,
                    aes(fill=Tariff, map_id=RS),
                    color="#7f7f7f", size=0.15)
gg <- gg + geom_text(data=egg, aes(x=x, y=y, label=RS), size=3)
gg <- gg + coord_equal() # we already projected it
gg <- gg + scale_fill_continuous(name=expression(Tariff), 
                                 low="red", high="green", space="Lab", 
                                 na.value="white", guide="colourbar")
gg <- gg + labs(x=NULL, y=NULL)

# decent map theme 

gg <- gg + theme_bw()
gg <- gg + theme(panel.grid=element_blank())
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(axis.text=element_blank())


gt <- tableGrob(cbind(bundesl@data[,c(2,4)], egg[,2]))

grid.arrange(gg, gt, ncol=2)

enter image description here

08 & 16 have unicode in them, hence the lack of display without conversion. I also realize the plotting of the RS number on the centroid is problematic for Berlin & Brandenburg, but it was to give a general idea, not be perfect.

I'd highly suggest using cut to define 5 or 6 standardized breaks for the values vs use a continuous scale.

Upvotes: 2

Related Questions