Reputation: 169
I have a data frame containing a number of projects + their start date + their coordinates(long/lat) and I have a data frame containing a number of (fictional) respondents + the date they were surveyed + their coordinates:
respond_id<- c(1:5)
survey_year<- c(2007, 2005, 2008, 2004, 2005)
lat_1<- c(53.780928, 54.025200, 53.931432, 53.881048, 54.083359)
long_1<- c(9.614991, 9.349862, 9.473498, 10.685581, 10.026894)
project_id<- c(1111:1114)
year_start<- c(2007, 2007, 2006, 2008)
lat_2<- c(54.022881, 54.022881, 53.931753, 53.750523)
long_2<- c(9.381104, 9.381104, 9.505700, 9.666336)
survey<- data.frame(respond_id, survey_year, lat_1, long_1)
projects<- data.frame(project_id, year_start, lat_2, long_2)
Now, I want to create a new variable survey$project_nearby that counts the amount of projects located nearby (here: 5km) the respondents. So the data frame survey
should look somewhat like this (other results possible):
> survey
respond_id survey_year lat_1 long_1 projects_nearby
1 1 2007 53.780928 9.614991 0
2 2 2005 54.025200 9.349862 0
3 3 2008 53.931432 9.473498 1
4 4 2004 53.881048 10.685581 0
5 5 2005 54.083359 10.026894 0
Special attention needs to be paid to the start years of the projects and the year the surveys were conducted: If a respondent was asked in 2007, but the project nearby was completed in 2008, this project naturally does not count as project nearby.
I thought of creating a distance matrix and then just counting the number of rows containing a distance smaller than 5km... but I don't know how to create this distance matrix. And maybe a for loop would be easier? Could anyone help me or give me a hint, what would be the code for doing this?
EDIT: I edited the expected values of survey$projects_nearby. Now these values should match with actual amount of projects located nearby the respective respondents.
Upvotes: 0
Views: 241
Reputation: 541
You can use the sp
package to find the distances, and then just count the number that are nearby. That is,
library(sp)
survey.loc <- matrix(as.numeric(as.character(unlist(survey[, 3:4]))), ncol = 2)
project.loc <- matrix(as.numeric(as.character(unlist(projects[, 3:4]))), ncol = 2)
distances <- spDists(survey.loc, project.loc, longlat = TRUE)
survey$project_nearby <- apply(distances, 1, function(x) sum(x<5))
I hope this helps!
My apologies for not considering the date.
library(sp)
survey.loc <- matrix(as.numeric(as.character(unlist(survey[, 3:4]))), ncol = 2)
project.loc <- matrix(as.numeric(as.character(unlist(projects[, 3:4]))), ncol = 2)
distances <- spDists(survey.loc, project.loc, longlat = TRUE)
year.diff <- sapply(projects$year_start, function(x) survey$survey_year-x)
year.diff <- ifelse(year.diff < 0, Inf, 1)
survey$project_nearby <- apply(year.diff*distances, 1, function(x) sum(x<5))
Upvotes: 0
Reputation: 6522
I don't think the correct answer is that shown? Below I left_join
by the year so that every row of survey
will be replicated for every matching projects
. Then I filter to rows where the lats are below 5 km. Count them and join back to the original survey.
Slightly confusing results too as project1 and 2 from same year are in same location. I count them twice with this code.
>survey
respond_id survey_year lat_1 long_1
1 1 2007 53.78093 9.614991
2 2 2005 54.02520 9.349862
3 3 2008 53.93143 9.473498
4 4 2004 53.88105 10.685581
5 5 2005 54.08336 10.026894
>projects
> projects
project_id year_start lat_2 long_2
1 1111 2007 54.02288 9.381104
2 1112 2007 54.02288 9.381104
3 1113 2006 53.93175 9.505700
4 1114 2008 53.75052 9.666336
> left_join(survey, projects, by = c( "survey_year"="year_start")) %>%
+ dplyr::filter( sqrt((lat_1-lat_2)^2 + (long_1-long_2)^2 ) < 5) %>%
+ group_by(respond_id, survey_year, lat_1, long_1) %>%
+ summarise(projects_nearby = n()) %>%
+ right_join(survey)
Joining, by = c("respond_id", "survey_year", "lat_1", "long_1")
Source: local data frame [5 x 5]
Groups: respond_id, survey_year, lat_1 [?]
respond_id survey_year lat_1 long_1 projects_nearby
<int> <dbl> <dbl> <dbl> <int>
1 1 2007 53.78093 9.614991 2
2 2 2005 54.02520 9.349862 NA
3 3 2008 53.93143 9.473498 1
4 4 2004 53.88105 10.685581 NA
5 5 2005 54.08336 10.026894 NA
.. you can of course change NA to zero if appropriate...
Upvotes: 1
Reputation: 597
I think you have to convert your lat, long coordinates to coordinates in a plane or using this link below from a previous post:
https://stackoverflow.com/questions/27928/calculate-distance-between-two-latitude-longitude-points-haversine-formula
Once you have distances to a particular location in the projects data frame, you may need to find similar
points using knn
or any other technique of your preference.
Upvotes: 0