M. Beausoleil
M. Beausoleil

Reputation: 3555

How to calculate the number of points per square in a grid of a custom size?

I have a large dataset of individuals that are in an area. But I want to change the sampling design by separating my field or space with a predefined grid. Here is the dataset:

set.seed(1456)
n = 100
x=rnorm(n)
x
y = 1:n
df =data.frame(x = x, y = y, sp = sample(letters[1:5], size = 100,replace = T),stringsAsFactors = TRUE)

plot(y = df$x, x =  y, pch =21, 
     bg = df$sp,
     col = df$sp, 
     cex = .4)

This will create a grid of the area that I'm studying

xytransect <- expand.grid(seq(0, n, 5), seq(min(x), max(x), .6))

This is to show the "nodes" of the grid

points(xytransect, cex= 0.3, pch = 21, 
       bg  = "pink", 
       col = "pink")

This is just showing the actual grid on the area.

abline(v = seq(0, n, 5), h = seq(min(x), max(x), .6))

enter image description here

The idea in this is to group the species and see how many are present within a square of the grid.

I was able to group the species (here letters) based on their name on the whole area. But how can I group them on the grid that I created?

library(dplyr)
df %>%  
  group_by(sp) %>% 
  summarise(n()) 

Would it be possible to get the center of each square and colour the square by the amount of species (letters) it had inside?

That's Jason's answer. enter image description here

Upvotes: 0

Views: 1294

Answers (2)

qdread
qdread

Reputation: 3943

Here is an answer using ggplot2's geom_tile() to plot the tiles filled in by number of unique species found in each tile, which is what the OP requested and is different than the number of individuals per tile.

library(dplyr)
library(ggplot2)

# Add some excess to the limits to ensure that all points are captured,
# even those on the edges.
xcoords <- seq(min(x)-1, max(x)+1, .6)
ycoords <- seq(-5, n+5, 5)

# Determine cell index and its coordinates for each individual.
df <- df %>%
  mutate(x_cell_index = sapply(x, function(z) which(z < xcoords)[1]),
         x_cell_min = xcoords[x_cell_index - 1],
         x_cell_max = xcoords[x_cell_index],
         y_cell_index = sapply(y, function(z) which(z < ycoords)[1]),
         y_cell_min = ycoords[y_cell_index - 1],
         y_cell_max = ycoords[y_cell_index])

# Summarize the number of unique species found in each cell.
df_cellcounts <- df %>%
  group_by(x_cell_min, x_cell_max, y_cell_min, y_cell_max) %>%
  summarize(n_spp = length(unique(sp)))

# Plot it.
ggplot(df_cellcounts, aes(x = (x_cell_min+x_cell_max)/2, y = (y_cell_min+y_cell_max)/2, fill = factor(n_spp))) +
  geom_tile()

This produces the following plot.enter image description here

Upvotes: 1

Jason
Jason

Reputation: 581

I've edited the response to use the same bin definition as in the Q.

ibins <- seq(0, nrow(df)+5, 5)
jbins <- seq(min(df$x)-0.6, max(df$x)+0.6, .6)
xytransect <- expand.grid(seq(0, n, 5), seq(min(x), max(x), .6))

out <- df %>% 
  mutate(i = min(ibins) + 5*(cut(row_number(), breaks= ibins,labels=FALSE)-1),
         j = min(jbins) + 0.6*(cut(x,breaks=jbins,labels=FALSE)-1)) %>%
  group_by(i,j) %>% 
  summarise(count=n()) %>%
  ungroup() %>%
  mutate(i_center = i+2.5,
         j_center = j+0.3)


plot(out$i_center, out$j_center, cex = out$count/max(out$count), pch = 21, col ="orange", bg = "orange")
abline(v = seq(0, n, 5), h = seq(min(x), max(x), .6))

enter image description here

Upvotes: 1

Related Questions