Reputation: 1111
I am trying to create a map that shows in circles the cities where subjects in my data set originated. I would like the circles to be proportional to the number of people in the city in my data. I would also like an additional circle to be a subset of the original circle showing the people in each city afflicted by the disease.
I have started doing this with ggmap by getting longitudes and latitudes:
library(ggplot2)
library(maps)
library(ggmap)
geocode("True Blue, Grenada")
I'm stuck because I don't know how to continue. I can't load the US map alone because there is one location in the Caribbean.
here is my data in short format, the actual data set is far too large.
subjectid location disease
12 Atlanta, GA yes
15 Boston, MA no
13 True Blue, Grenada yes
85 True Blue, Grenada yes
46 Atlanta, GA yes
569 Boston, MA yes
825 True Blue, Grenada yes
685 Atlanta, GA no
54 True Blue, Grenada no
214 Atlanta, GA no
685 Boston, MA no
125 True Blue, Grenada yes
569 Boston, MA no
can someone please help?
Upvotes: 2
Views: 2068
Reputation: 32789
This should get you started. It does not plot circles within circles. ggplot can be made to map different variables to the same aesthetic (size), but with difficulty. Here, the size of the point represents the total count, and the colour of the point represents the number diseased. You will need to adjust the size scale for your full set of data.
The code below gets the geographic locations of the cities then merges them back into the data files. Then it summarises the data to give a data frame containing the required counts. The map is drawn with boundaries set by the maximum and minimum lon and lat of the cities. The last step is to plot the cities and the counts on the map.
# load libraries
library(ggplot2)
library(maps)
library(ggmap)
library(grid)
library(plyr)
# Your data
df <- read.table(header = TRUE, text = "
subjectid location disease
12 'Atlanta, GA' yes
15 'Boston, MA' no
13 'True Blue, Grenada' yes
85 'True Blue, Grenada' yes
46 'Atlanta, GA' yes
569 'Boston, MA' yes
825 'True Blue, Grenada' yes
685 'Atlanta, GA' no
54 'True Blue, Grenada' no
214 'Atlanta, GA' no
685 'Boston, MA' no
125 'True Blue, Grenada' yes
569 'Boston, MA' no", stringsAsFactors = FALSE)
# Get geographic locations and merge them into the data file
geoloc <- geocode(unique(df$location))
pos <- data.frame(location = unique(df$location), geoloc, stringsAsFactors = FALSE)
df <- merge(df, pos, by = "location", all = TRUE)
# Summarise the data file
df = ddply(df, .(location, lon, lat), summarise,
countDisease = sum(ifelse(disease == "yes", 1, 0)),
countTotal = length(location))
# Plot the map
mp1 <- fortify(map(fill = TRUE, plot = FALSE))
xmin <- min(df$lon) - 5
xmax <- max(df$lon) + 7
ymin <- min(df$lat) - 5
ymax <- max(df$lat) + 5
Amap <- ggplot() +
geom_polygon(aes(x = long, y = lat, group = group), data = mp1, fill = "grey", colour = "grey") +
coord_cartesian(xlim = c(xmin, xmax), ylim = c(ymin, ymax)) +
theme_bw()
# Plot the cities and counts
Amap <- Amap + geom_point(data = df, aes(x = lon, y = lat, size = countTotal, colour = countDisease)) +
geom_text(data = df, aes(x = lon, y = lat, label = gsub(",.*$", "", location)), size = 2.5, hjust = -.3) +
scale_size(range = c(3, 10)) +
scale_colour_continuous(low = "blue", high = "red", space = "Lab")
Upvotes: 1