bison2178
bison2178

Reputation: 789

Merge two states North and South Dakota in geom_map

In this example of plotting US States using geom_map

library(ggplot2)
library(dplyr)

us <- map_data("state")

arr <- USArrests %>% 
  add_rownames("region") %>% 
  mutate(region=tolower(region))

gg <- ggplot()
gg <- gg + geom_map(data=us, map=us,
                    aes(x=long, y=lat, map_id=region),
                    fill="#ffffff", color="#ffffff", size=0.15)
gg <- gg + geom_map(data=arr, map=us,
                    aes(fill=Murder, map_id=region),
                    color="#ffffff", size=0.15)
gg <- gg + scale_fill_continuous(low='thistle2', high='darkred', 
                                 guide='colorbar')
gg <- gg + labs(x=NULL, y=NULL)
gg <- gg + coord_map("albers", lat0 = 39, lat1 = 45) 
gg <- gg + theme(panel.border = element_blank())
gg <- gg + theme(panel.background = element_blank())
gg <- gg + theme(axis.ticks = element_blank())
gg <- gg + theme(axis.text = element_blank())
gg

How do I merge two states together , for example, North and South Dakota, how do I get rid of the boundary between these two states and show them as "Megakotas" .

Upvotes: 2

Views: 615

Answers (2)

Calum You
Calum You

Reputation: 15082

If you don't need to use geom_map and are willing to dive into some of the spatial packages in R such as sf, you can achieve this as follows. Here's my example using the GADM Database of Global Administrative Regions to pull the state boundaries.

  1. Use raster::getData as a convenient shortcut to get the boundary files. level refers to the administrative level, where 0 is the country and 1 is the first level below that. Note that this will save a file into your working directory. I convert to the Albers projection as your example does using st_transform.

  2. Join the polygons onto the USArrests dataset and use the convenient dplyr tools to combine the Dakotas. Note that I sum the values for the two states although I'm not clear that's the right thing to do (isn't this value a mean so maybe we should take the mean?) This automatically unions the two states' boundaries.

  3. Before plotting, I use st_simplify to reduce the number of vertices to make plotting faster. Then we can simply use geom_sf and get the desired plot.

library(raster)
library(tidyverse)
library(sf)
# library(sp)
# library(rgeos)

states <- getData("GADM", country = "USA", level = 1) %>%
  st_as_sf() %>%
  st_transform(crs = 3310)

megakotas <- states %>% 
  left_join(
    y = rownames_to_column(USArrests, var = "State"),
    by = c("NAME_1" = "State")
  ) %>%
  filter(!NAME_1 %in% c("District of Columbia", "Alaska", "Hawaii")) %>%
  mutate(State = fct_collapse(NAME_1, Megakotas = c("North Dakota", "South Dakota"))) %>%
  group_by(State) %>%
  summarise(Murder = sum(Murder)) %>%
  st_simplify(dTolerance = 1000)

ggplot(megakotas) +
  geom_sf(aes(fill = Murder)) +
  scale_fill_viridis_c() +
  theme_minimal()

Created on 2019-03-07 by the reprex package (v0.2.1)

Upvotes: 2

acylam
acylam

Reputation: 18701

Turns out to be more work then I have expected, but here is my solution. The basic idea is to use the maps::map function along with boundary = TRUE and interior = FALSE to get the coordinates of "north dekota" and "south dekota" without the interior boundary and plot that separately from other states.

Step 1: Grab "megakotas" coordinates...

# To install gissr
# library(devtools)
# Install dependency
# install_github("skgrange/threadr")
# Install gissr
# install_github("skgrange/gissr")

library(dplyr)
library(maps)
library(gissr)

us_coord <- map_data("state") 
megakotas_coord <- map("state", regions = c("north dakota", "south dakota"), 
                       boundary = TRUE, interior = FALSE, plot = FALSE)[c("x", "y")] %>%
  as.data.frame() %>%
  sort_points(y = "y", x = "x") %>%
  mutate(region = "megakotas")

Output:

> head(megakotas_coord)
          x        y    region
1 -104.0491 45.21210 megakotas
2 -104.0434 45.87673 megakotas
3 -104.0491 45.93976 megakotas
4 -104.0491 45.93976 megakotas
5 -104.0434 46.27207 megakotas
6 -104.0434 46.53563 megakotas

Here, we use gissr::sort_points to sort the coordinates from map in clockwise order, then replaced region with "megakotas" instead of "north dakota" and "south dakota".

Step 2: Replace "north dakota" and "south dakota" with "megakotas" in arr and sum their values...

arr <- USArrests %>% 
  add_rownames("region") %>%
  mutate(region = replace(tolower(region), tolower(region) %in% c("north dakota", "south dakota"), 
                          "megakotas")) %>%
  group_by(region) %>%
  mutate_all(sum) 

Output:

> arr %>% filter(region == "megakotas")
# A tibble: 2 x 5
# Groups:   region [1]
  region    Murder Assault UrbanPop  Rape
  <chr>      <dbl>   <int>    <int> <dbl>
1 megakotas    4.6     131       89  20.1
2 megakotas    4.6     131       89  20.1

Step 3: Plot us_coord and megakotas_coord separately, and use the data argument in geom_map to control what gets printed...

library(ggplot2)

ggplot(mapping = aes(map_id=region)) +
  geom_map(data = filter(arr, region != "megakotas"), 
           map = us_coord, aes(fill = Murder),
           size = 0.15, color="#ffffff") +
  expand_limits(x = us_coord$long, y = us_coord$lat)

enter image description here

ggplot(mapping = aes(map_id=region)) +
  geom_map(data = filter(arr, region == "megakotas"),
           map = megakotas_coord, aes(fill = Murder), 
           size = 0.15, color = "#ffffff") + 
  expand_limits(x = us_coord$long, y = us_coord$lat)

enter image description here

Final Step: Combine everything...

# To install gissr
# library(devtools)
# Install dependency
# install_github("skgrange/threadr")
# Install gissr
# install_github("skgrange/gissr")

library(ggplot2)
library(dplyr)
library(maps)
library(gissr)

us_coord <- map_data("state") 
megakotas_coord <- map("state", regions = c("north dakota", "south dakota"), 
                       boundary = TRUE, interior = FALSE, plot = FALSE)[c("x", "y")] %>%
  as.data.frame() %>%
  sort_points(y = "y", x = "x") %>%
  mutate(region = "megakotas")

arr <- USArrests %>% 
  add_rownames("region") %>%
  mutate(region = replace(tolower(region), tolower(region) %in% c("north dakota", "south dakota"), 
                          "megakotas")) %>%
  group_by(region) %>%
  mutate_all(sum) 

ggplot(mapping = aes(map_id=region, fill = Murder)) +
  geom_map(data = filter(arr, region != "megakotas"),
           map = us_coord, size = 0.15, color = "#ffffff") +
  geom_map(data = filter(arr, region == "megakotas"),
           map = megakotas_coord, size = 0.15, color = "#ffffff") + 
  expand_limits(x = us_coord$long, y = us_coord$lat) +
  scale_fill_continuous(low = 'thistle2', high = 'darkred', guide = 'colorbar') +
  labs(x=NULL, y=NULL) +
  coord_map("albers", lat0 = 39, lat1 = 45)  +
  theme(panel.border = element_blank(),
      panel.background = element_blank(),
      axis.ticks = element_blank(),
      axis.text = element_blank())

enter image description here

Note: This solution would be much simpler if map_data respects boundary = TRUE and interior = FALSE, which it should according to the documentation of map_data (?map_data clearly says one can pass other arguments to maps::map()). Somehow map_data("state", region = c("north dakota", "south dakota"), boundary = TRUE, interior = FALSE) doesn't seem to work.

Upvotes: 1

Related Questions