Reputation: 3555
I have a large dataset of individuals that are in an area. But I want to change the sampling design by separating my field or space with a predefined grid. Here is the dataset:
set.seed(1456)
n = 100
x=rnorm(n)
x
y = 1:n
df =data.frame(x = x, y = y, sp = sample(letters[1:5], size = 100,replace = T),stringsAsFactors = TRUE)
plot(y = df$x, x = y, pch =21,
bg = df$sp,
col = df$sp,
cex = .4)
This will create a grid of the area that I'm studying
xytransect <- expand.grid(seq(0, n, 5), seq(min(x), max(x), .6))
This is to show the "nodes" of the grid
points(xytransect, cex= 0.3, pch = 21,
bg = "pink",
col = "pink")
This is just showing the actual grid on the area.
abline(v = seq(0, n, 5), h = seq(min(x), max(x), .6))
The idea in this is to group the species and see how many are present within a square of the grid.
I was able to group the species (here letters) based on their name on the whole area. But how can I group them on the grid that I created?
library(dplyr)
df %>%
group_by(sp) %>%
summarise(n())
Would it be possible to get the center of each square and colour the square by the amount of species (letters) it had inside?
Upvotes: 0
Views: 1294
Reputation: 3943
Here is an answer using ggplot2's geom_tile()
to plot the tiles filled in by number of unique species found in each tile, which is what the OP requested and is different than the number of individuals per tile.
library(dplyr)
library(ggplot2)
# Add some excess to the limits to ensure that all points are captured,
# even those on the edges.
xcoords <- seq(min(x)-1, max(x)+1, .6)
ycoords <- seq(-5, n+5, 5)
# Determine cell index and its coordinates for each individual.
df <- df %>%
mutate(x_cell_index = sapply(x, function(z) which(z < xcoords)[1]),
x_cell_min = xcoords[x_cell_index - 1],
x_cell_max = xcoords[x_cell_index],
y_cell_index = sapply(y, function(z) which(z < ycoords)[1]),
y_cell_min = ycoords[y_cell_index - 1],
y_cell_max = ycoords[y_cell_index])
# Summarize the number of unique species found in each cell.
df_cellcounts <- df %>%
group_by(x_cell_min, x_cell_max, y_cell_min, y_cell_max) %>%
summarize(n_spp = length(unique(sp)))
# Plot it.
ggplot(df_cellcounts, aes(x = (x_cell_min+x_cell_max)/2, y = (y_cell_min+y_cell_max)/2, fill = factor(n_spp))) +
geom_tile()
This produces the following plot.
Upvotes: 1
Reputation: 581
I've edited the response to use the same bin definition as in the Q.
ibins <- seq(0, nrow(df)+5, 5)
jbins <- seq(min(df$x)-0.6, max(df$x)+0.6, .6)
xytransect <- expand.grid(seq(0, n, 5), seq(min(x), max(x), .6))
out <- df %>%
mutate(i = min(ibins) + 5*(cut(row_number(), breaks= ibins,labels=FALSE)-1),
j = min(jbins) + 0.6*(cut(x,breaks=jbins,labels=FALSE)-1)) %>%
group_by(i,j) %>%
summarise(count=n()) %>%
ungroup() %>%
mutate(i_center = i+2.5,
j_center = j+0.3)
plot(out$i_center, out$j_center, cex = out$count/max(out$count), pch = 21, col ="orange", bg = "orange")
abline(v = seq(0, n, 5), h = seq(min(x), max(x), .6))
Upvotes: 1