John Paper
John Paper

Reputation: 83

Mapping variables to hexagon size and color with hex_bin

So, I found this code about how to map a variable to hexagon size a while ago and tried to modify it so I could use it for my purpose of drawing basketball shot charts. I know that there have been some other threads like this one, but none I've read in the past answered my question. The first one does help but I'm stuck with one little problem:

Let's say I have a data frame with 4 variables, x, y, value (of points the shot has; in basketball it's either 2 or 3, depending on how far from the basket you take the shot), and outcome (1 for shot made, 0 for shot missed), on 250 observations. 250 shots with x and y coordinates, value and outcome.

Example:

     x         y     value outcome
1 169.7650 -316.5726     3   0
2  75.0775 -182.3126     2   0
3  94.0150 -147.4050     2   1
4 109.1650 -138.0068     2   0
5  87.7025 -146.0624     2   1

# dput below:

structure(list(x = c(169.765, 75.0775, 94.015, 109.165, 87.7025), 
y = c(-316.5726, -182.3126, -147.405, -138.0068, -146.0624), 
value = c(3L, 2L, 2L, 2L, 2L), outcome = c(0L, 0L, 1L, 0L, 1L)), 
.Names = c("x", "y", "value", "outcome"), class = "data.frame", row.names = c(NA, -5L))

Negative coordinates because (0/0) is in the top left corner. With the code from the first thread I linked above I was able to bin my data, I just can't figure out how to operate on the binned data. This is what I got so far:

Imgur-Link

From this code:

# devtools::install_git("https://github.com/hadley/densityvis.git")

library(densityvis)

bin = hex_bin(df$x, df$y, var4=df$value, frequency.to.area=TRUE)
hexes = hex_coord_df(x=bin$x, y=bin$y, 
                     width=attr(bin,"width"), height=attr(bin,"height"),
                     size=bin$size)
hexes$rightness = rep(bin$col, each=6)

ggplot(hexes, aes(x=x, y=y)) + geom_polygon(aes(fill=rightness, group=id))

With the size displaying how many shots were TAKEN from the given area. Color gives the value of the shots from that area. What I want is something like points per shot, meaning: summing up the points per bin and then dividing by the number of shots taken, ranging from 0 (no shots made) to 3 (all shots made from a 3 point area) and displaying only bins with at least two shots TAKEN.

I know it is a lot to ask, and it's my problem that I can't do it on my own. But if anyone had the time, any help would be much appreciated.

Edit: I uploaded the csv sample that created the above image here. I don't know if it's cool to post 300 lines of code into a question, that's why I link geotheory's code here. My slightly modified example is in the code bracket above, I just ran

df <- read.csv("sample_data.csv", header=TRUE)

beforehand.

Upvotes: 2

Views: 1150

Answers (1)

geotheory
geotheory

Reputation: 23650

As the hex_bin code stands the zero value observations are filtered out. This can be changed by removing the & var4 > 0 argument from clean_xy (line 117 in github). Then the following:

df$pts = 0
for(i in 1:nrow(df)) if(df$outcome[i] == 1) df$pts[i] = df$value[i]
bin = hex_bin(df$x, df$y, var4=df$pts, frequency.to.area=TRUE)
hexes = hex_coord_df(x=bin$x, y=bin$y, width=attr(bin,"width"), height=attr(bin,"height"), size=bin$size)
hexes$points = rep(bin$col, each=6)
ggplot(hexes, aes(x=x, y=y)) + geom_polygon(aes(fill=points, group=id))

gives you:

enter image description here

Is that what you're looking for?

Upvotes: 3

Related Questions