Reputation: 11
I am a dilettante when it comes to R coding. I am trying to run the following code for one of the tasks. My basic purpose is to count the number of attractions within the proximity of 2kms of a specific location, both attractions, and the locations are specified by respective longitude and latitude. The number of records in the main data set is around 29K and while the number of attractions is 28. How can I convert the following code in a better performing R code instead (the current one is really crude and not at all a good practice)
for(i in 1:nrow(mainData)) {
attr_count[i] = 0
loc_coord = c(mainData$longitude[i],mainData$latitude[i])
for(j in 1:nrow(ny_attractions)) {
attr_coord = c(ny_attractions$lon[j],ny_attractions$lat[j])
dist = distVincentySphere(attr_coord,loc_coord)
if(dist <= 2000) {
attr_count[i] = attr_count[i] + 1
}
}
}
[EDIT]: My apologies for not putting it clearly earlier. Here's an example of what I am trying to achieve. I have 2 data sets -
Dataset - 1 (NYC_attractions) (27 records)
Dataset-2 (master data for house listings) (29K records)
Now, I need to add one more column (num_of_attractions) in Dataset-2, representing the number of attractions within 2Kms of the specified listing (i.e. per record in data set-2)
Hope, this explains the problem
Thanks
Upvotes: 0
Views: 46
Reputation: 2214
Hello your question is partly answered here https://stackoverflow.com/a/49860968/3042154. As you use geodetic coordinates (lat/lon) instead of projected coordinates (meters) it can be done in to steps. First roughly select potential neighbours using euclidian distance using given answer then refine the selection by using your distance
Upvotes: 1