aport550
aport550

Reputation: 119

Find the Number of Points Within a Certain Radius of a Data Point in R

I have 2 datasets one for hospitals and another for procedures. Each dataset has latitude and longitude coordinates. Procedures are either given in or out of the hospital, though the coordinates are not necessarily precise if given in the hospitals. Im trying to form a radius of a certain size around each of the hospitals and determine how many procedure points fall within that radius on average. So if for example I have 100 hospitals and 3000 procedures, I want to form a radius around all hospitals and see on average how many hospitals fall into that specified radius. My initial code is below, but I know this can be done faster. coded in R. Thanks!

for(i in 1:NROW(hospitals)){
  hospital <- hospitals[i,]
  radius <- .016

  # find all the procedures that lie in the .016 sized radius from this hospital

  hospital$latitude_low <- hospital$lat - radius
  hospital$longitude_low <- hospital$long - radius
  hospital$latitude_high <- hospital$lat + radius
  hospital$longitude_high <- hospital$long + radius

  in_rad <- procedures[(procedures$long >= hospital$longitude_low & procedures$long <= 
  hospital$longitude_high & procedures$lat <= hospital$latitude_high & procedures$lat >= 
  hospital$latitude_low),]

  num <- NROW(in_rad)
  hospitals[i,]$number_of_procedures <- num
}

Upvotes: 2

Views: 1818

Answers (2)

Robert Hijmans
Robert Hijmans

Reputation: 47091

When you ask a question, you should always include some example data. Like this

lat <- c(-23.8, -25.8)
lon <- c(-49.6, -44.6)
hosp <- cbind(lon, lat)


lat <- c(-22.8, -24.8, -29.1, -28, -20)
lon <- c(-46.4, -46.3, -45.3, -40, -30)
procedures <- cbind(lon, lat)

Are your data in longitude/latitude? If so, you need to use a proper method to compute distances. For example

 library(geosphere)
 dm <- distm(procedures, hosp)

Or

 library(raster)
 d <- pointDistance(procedures, hosp, lonlat=TRUE)

Both compute the distance from all procedures to all hospitals. This will fail with very large datasets, but from what you describe it should work fine. Now you can use a threshold (here 400,000 m) to find out which procedures are within that distance of each hospital

apply(d < 400000, 2, which)
#[[1]]
#[1] 1 2

#[[2]]
#[1] 1 2 3

So procedure 1, 2 and 3 are within that distance to hospital 2

If your data are not longitude/latitude, you can use

 d <- pointDistance(procedures, hosp, lonlat=FALSE)

Upvotes: 3

Allan Cameron
Allan Cameron

Reputation: 173793

There are a couple of things that could be improved here. Firstly, you are not actually calculating procedures done within a radius of 0.16 units from the hospital, but procedures done within a 0.32 * 0.32 units square with the hospital at its center. Probably not a huge deal for the specific problem, but its actually quicker to work out points within a particular distance, as you actually intended.

Secondly, you have have a tendency to store any variables you have calculated even if you're only going to use them once. This can help with understanding the code, but is sometimes less efficient and certainly makes your code longer, particularly if you like using long_descriptive_variable_names.

Thirdly, at the end, you subset procedures and then measure the number of rows, rather than just using the length of the subset itself.

Lastly (but less importantly), you write the result one value at a time into a new column. You can do this all in one gulp using sapply instead.

So your code could be replaced with something much simpler, like:

hospitals$number_of_procedures <- sapply(1:NROW(hospitals), function(i)
  {
    d <- (procedures$long - hospitals[i,]$long)^2 + (procedures$lat - hospitals[i,]$lat)^2
    length(which(d < 0.16^2))
  })

Upvotes: 1

Related Questions