Matias Andina
Matias Andina

Reputation: 4230

Count points inside polygons by factor in R

I have two data sets of xy coordinates. The first one has xy coordinates plus a tag column with my factor levels. I called the data.frame qq and it looks like this:

structure(list(x = c(5109, 5128, 5137, 5185, 5258, 5324, 5387, 
5343, 5331, 5347, 5300, 5180, 4109, 4082, 4091, 4139, 4212, 4279, 
4291, 4297, 4285, 4301, 4254, 4181), y = c(1692, 1881, 2070, 
2119, 2144, 2065, 1987, 1813, 1705, 1649, 1631, 1654, 1847, 2015, 
2204, 2253, 2278, 2282, 2166, 1947, 1839, 1783, 1765, 1783), 
    tag = c("MPN_right", "MPN_right", "MPN_right", "MPN_right", 
    "MPN_right", "MPN_right", "MPN_right", "MPN_right", "MPN_right", 
    "MPN_right", "MPN_right", "MPN_right", "MPN_left", "MPN_left", 
    "MPN_left", "MPN_left", "MPN_left", "MPN_left", "MPN_left", 
    "MPN_left", "MPN_left", "MPN_left", "MPN_left", "MPN_left"
    )), .Names = c("x", "y", "tag"), row.names = c(NA, -24L), class = "data.frame") 

I generated random data for the other one using qq xy means with a large sd.

set.seed(123)
my_points=data.frame(x=rnorm(n =1000,mean=mean(qq$x),sd=1000),
y=rnorm(n=1000,mean=mean(qq$y),sd=1000))

if I use the in.out function from mgcv package, I obtain somewhat what I want.

The main issues with this approach is that my 'Polygon' is not closed nor will be interpreted as 2 polygons by factor. The package advises to use one NA row in between but I'd rather use my tag column since I will be trying to use more than 2 levels in my tag factor, namely, more than 2 polygons). My final goal is to produce a table of the number of points within each one.

Upvotes: 3

Views: 1035

Answers (3)

cuttlefish44
cuttlefish44

Reputation: 6796

lapply() and sapply() help you to use function par level.

  ## a bit edited to make output clear

library(dplyr); library(mgcv)

TAG <- unique(qq$tag)

IN.OUT <- lapply(TAG, function(x) as.matrix(qq[qq$tag==x, 1:2])) %>%  # make a matrix par level
  sapply(function(x) in.out(x, as.matrix(my_points)))        # use in.out() with each matrix

colnames(IN.OUT) <- TAG

head(IN.OUT, n = 3)

#      MPN_right MPN_left
# [1,]     FALSE    FALSE
# [2,]     FALSE    FALSE
# [3,]     FALSE    FALSE

apply(IN.OUT, 2, table)

#       MPN_right MPN_left
# FALSE       983      990
# TRUE         17       10

Upvotes: 1

lbusett
lbusett

Reputation: 5932

what about this:

mysppoint <- SpatialPoints(coords = my_points)  # create spatial points
qq$tag <- as.factor(qq$tag)
polys = list()

# create one polygon for each factor level
for (lev in levels(qq$tag)){
  first_x <- qq$x[qq$tag == lev][1]
  first_y <- qq$y[qq$tag == lev][1]
  qq <- rbind(qq, data.frame(x = first_x, y = first_y, tag = lev))  # "close" the polygon by replicating the first row
  polys[[lev]] <- Polygons(list(Polygon(matrix(data = cbind(qq$x[qq$tag == lev], # transform to polygon
                                                            qq$y[qq$tag == lev]), 
                                               ncol = 2))), lev)
}

mypolys <-  SpatialPolygons(polys)   # convert to spatial polygons
inters  <-  factor(over(mysppoint, mypolys), labels = names(mypolys)) # intersect points with polygons
table(inters)

, which gives:

inters
 MPN_left MPN_right 
       10        17 

advantage of this is that it gives you proper spatial objects to work with. For example:

plotd <- fortify(mypolys )
p <- ggplot()
p <- p + geom_point(data = my_points, aes(x = x , y = y), size = 0.2)
p <- p + geom_polygon(data = plotd, aes(x = long, y = lat, fill = id), alpha = 0.7)
p

Plot of polygons and points

Upvotes: 2

Matias Andina
Matias Andina

Reputation: 4230

I ended up using lapply and a combination of split and more lapply. So here is the code, please ignore the extract_coords helper function, it basically gives me a dataframe with the x,y and tag columns. I also managed to subset the points from the original your_coords and count them (returning them as a vector instead of a table).

inside_ROI = function(your_ROI_zip,your_coords){

  # Helper function will take list from zip ROIs and merge them into a df  
  qq=extract_coords(your_ROI_zip)  

  # We use tag for splitting by the region 

  lista=split(qq,qq$tag)

  # We check if they are in or out
  who_is_in = lapply(lista,function(t) in.out(cbind(t$x,t$y),x=cbind(your_coords$x,your_coords$y)))

  # We sum to get the by area countings
  region_sums = unlist(lapply(who_is_in,function(t) sum(as.numeric(t))))

  # obtain indices for subset of TRUE values
  aa=lapply(who_is_in,function(p) which(p==T))

  whos_coords=list()

for (i in aa){
  whos_coords = append(whos_coords,values=list(your_coords[i,]))
  #  whos_coords[[i]] = your_coords[i,]   
  }

  # Change names
  names(whos_coords) = names(aa)

  # Put into list for more than one output  
  out=list(region_sums,aa,whos_coords)  

  return(out)
}

Upvotes: 1

Related Questions