Ferdinand
Ferdinand

Reputation: 11

R multiple values per Polygon Choroplethr

I have the following issue and don´t know how to proceed:

I want to do a choropleth heat map, with data about German wind Power plants. Therefore I use one shape file, mapping all German Zip codes (works fine).

The second data frame contains all newly installed Wind power plants in Germany. I would like to show, in which ZIP area is the highest installed capacity over time.

When I try to do that within the package choroplethr, I face the issue, that in the data frame with the power plants, there are about 1000000 rows, so several zip code duplicates (a lot of ZIP codes contain more than one wind power plant...).

Due to this, I get the following error message:

anyDuplicated(self$user.df$region) == 0 is not TRUE

Here is the code. It is based on this example here: https://www.r-bloggers.com/case-study-mapping-german-zip-codes-in-r/

library(sf)
library(choroplethr)
library(dplyr)
library(ggplot2)
library(rgdal)
library(maptools)
library(gpclib)
library(readr)
library(R6)

ger_plz <- readOGR(dsn = ".", layer = "plz-5stellig")
ger_plz2 <- read_sf("...plz-5stellig.shp")

ger_plz@data$id <- rownames(ger_plz@data)
ger_plz.point <- fortify(ger_plz, region="id")
ger_plz.df <- inner_join(ger_plz.point,ger_plz@data, by="id")

BNETZAVZ <-read.csv2("WindPower DATA.csv", 
                      header = TRUE, sep = ";", dec = ",")
BNETZAVZ_k <- subset(BNETZAVZ, inst_leistung >= 100 & energietraeger >= "7" & energietraeger<="8" & stat_Relevanz=="1",
                     select=c(anlagenschl, plz, inst_leistung, spannungsebene, inbetriebnahme, ausserbetriebnahme, regelzone_name, energietraeger))
#BNETZAVZ_k$inbetriebnahme <- dmy_hms(as.character(BNETZAVZ$inbetriebnahme))
print(BNETZAVZ_k$plz)
# Datum funktioniert so::)
BNETZAVZ_k$inbetriebnahme <- as.Date(BNETZAVZ_k$inbetriebnahme, format = "%d.%m.%Y %H:%M:%S")
BNETZAVZ_k2000 <- subset(BNETZAVZ_k, inbetriebnahme >="2000-01-01")

# variable name 'region' is needed for choroplethr
ger_plz.df$region <- ger_plz.df$plz
#subclass choroplethr to make a class for your my need
GERPLZChoropleth <- R6Class("GERPLZChoropleth",
                            inherit = choroplethr:::Choropleth,
                            public = list(
                              initialize = function(user.df) {
                                super$initialize(ger_plz.df, user.df)
                              }
                            )
)
#choropleth needs these two columnames - 'region' and 'value'
colnames(BNETZAVZ_k2000) [1] <- "EEG-key"
colnames(BNETZAVZ_k2000) [2] <- "region"
colnames(BNETZAVZ_k2000)[3] <- "value"
BNA <- data.frame(BNETZAVZ_k2000$region, BNETZAVZ_k2000$value)
colnames(BNA) = c("region", "value")
#instantiate new class with data
c <- GERPLZChoropleth$new(BNA)

# THE ERROR MESSAGE IS DISPLAYED IN THE LINE ABOVE...

 #plot the data
    c$ggplot_polygon = geom_polygon(aes(fill = value), color = NA)
    c$title = "Capacity Windkraft BNETZA"
    c$legend= "Capacity per Zipcode"
    c$set_num_colors(9)
    c$render()

Upvotes: 1

Views: 350

Answers (1)

Ari
Ari

Reputation: 1972

I'm the author of choroplethr and unfortunately I'm having some difficulty understanding your question. However, I think that the key part of your question is this:

I would like to show, in which ZIP area is the highest installed capacity over time.

I don't know what "highest installed capacity over time" exactly means, or how that value is derived from the data that you have.

But choroplethr requires your data to be in a very particular format:

  1. A dataframe with one column called region and one column called value.
  2. Each value in region should match with a region in your shapefile.

Behind the scenes choroplethr merges your dataframe with the shapefile. If your data contains duplicate regions then the merge cannot occur, because it is ambiguous which value you want to use.

From reading your question I'm not exactly sure what your situation is. But I think it's likely that that you want to process your data so that each region appears once, and that the value is some function of the two files you talk about.

However, for the sake of completeness, I'll mention that it's at least possible that you are trying to create a bivariate choropleth. Choroplethr currently does not have that functionality.

Upvotes: 2

Related Questions