Reputation: 97
I am having issues thinking of a way around this geographic mapping problem in ggplot2. The issue is that ggplot is not filling in data for some states and leaving them blank. This makes sense, as those states don’t have any value based on my fill.
I know I could possibly add rows for those states and just fill them with 0s, but those states with no value should change over time. I am trying to build this to be automated, as in whoever does this month to month literally has to save the file and hit run, so I want this to update on its own.
In a perfect world, states with no values would be labeled differently on the axis as “no penetration”.
GGplot code:
map<- ggplot(penetration_levels,aes(long,lat,group=region,fill=Penetration),)+geom_polygon()+coord _equal()+scale_fill_gradient2(low="red",mid="white",high="green",midpoint=.25)
map
map<-map+geom_point(
data=mydata, aes(x=long, y=lat,group=1,fill=0, size=Annualized.Opportunity),
color="gray6") +
scale_size(name="Total Annual Opportunity-Millions",range=c(2,4))
map<-map+theme(plot.title = element_text(size = 12,face="bold"))
map
Head of my data and penetration
head(mydata)
Sold.To.Customer City State Annualized.Opportunity location lat long
21 10000110 NEW YORK NY 12.142579 NEW YORK,NY 40.71435 -74.00597
262 10016487 FORT LAUDERDALE FL 12.087310 FORT LAUDERDALE,FL 26.12244 -80.13732
349 11001422 ALLEN PARK MI 10.910575 ALLEN PARK,MI 42.25754 -83.21104
19 10000096 ALTON IL 10.040067 ALTON,IL 38.89060 -90.18428
477 11067228 BAY CITY TX 10.030829 BAY CITY,TX 28.98276 -95.96940
230 10014909 BETHPAGE NY 9.320271 BETHPAGE,NY 40.74427 -73.48207
head(penetration_levels)
State region long lat group order subregion state To From Total Penetration
17 AL alabama -87.46201 30.38968 1 1 <NA> AL 10794947 12537359 23332307 0.462661
18 AL alabama -87.48493 30.37249 1 2 <NA> AL 10794947 12537359 23332307 0.462661
22 AL alabama -87.52503 30.37249 1 3 <NA> AL 10794947 12537359 23332307 0.462661
36 AL alabama -87.53076 30.33239 1 4 <NA> AL 10794947 12537359 23332307 0.462661
37 AL alabama -87.57087 30.32665 1 5 <NA> AL 10794947 12537359 23332307 0.462661
65 AL alabama -87.58806 30.32665 1 6 <NA> AL 10794947 12537359 23332307 0.462661
merge:
#geocode
geocode<-geocode(mydata$location)
mydata$lat<-geocode$lat
mydata$long<-geocode$lon
#create us map and graph
states<-map_data("state")
#merge states
states<-merge(states,statelookup,by="region")
penetration_levels<-merge(states,penetration_levels,by="State")
penetration_levels<- penetration_levels[order(penetration_levels$order), ]
Then it goes directly into map plot
Upvotes: 1
Views: 2033
Reputation: 59345
So this turns out to be a common problem. Generally choropleth maps require some sort of merge of the map data with the dataset containing the information used to set the polygon fill colors. In OP's case this is done as follows:
states <- map_data("state")
states <- merge(states,statelookup,by="region")
penetration_levels <- merge(states,penetration_levels,by="State")
The problem is that, if penetration_levels
has any missing States
, these rows will be excluded from the merge (in database terminology, this is an inner join). So in rendering the map, those polygons will be missing. The solution is to use:
penetration_levels <- merge(states,penetration_levels,by="State",all.x=T)
This returns all rows of the first argument (the "x" argument), merged with any data from matching states in the second argument (a left join). Missing values are set to NA
.
The fill color of polygons (states) with NA
values is set by default to grey50
, but can be changed by adding the following call to the plot definition:
scale_fill_gradient(na.value="red")
Upvotes: 2
Reputation: 93771
Couldn't you add a check for missing states and add rows (with zero for penetration) for them to your data frame? A simple example:
# Create a generic data frame with zeros for penetration
zeros.data = data.frame(State=as.character(state.abb), penetration=0)
# Create a simplified analogue of your data
penetration_levels = data.frame(State=as.character(state.abb[1:30]),
penetration=runif(30,0.1,1))
# Get values for missing states
missing.states = setdiff(state.abb, unique(penetration_levels$State))
# Get required data for missing states.
penetration_levels = rbind(penetration_levels,
zeros.data[zeros.data$State %in% missing.states,])
You could do a check like this before running your plotting code to automatically fill out your data frame with zero penetration for all missing states (and of course your "zeros.data" data frame would have to include the other columns in your original data frame, filled with NAs or with whatever data you need for plotting.
Upvotes: 0