Reputation: 420
I have two csv files with format as below
File1.csv
Sr Lat,Long
1 52.361176,4.899779
2 52.34061,4.871195
3 52.374749,4.893847
4 52.356624,4.912281
5 52.374026,4.883685
6 52.369956,4.919778
7 52.370895,4.8703
8 52.390454,4.915024
9 52.378576,4.900253
10 52.378372,4.896219
11 52.380056,4.899697
12 52.383744,4.875805
13 52.369981,4.881528
14 52.375954,4.904786
15 52.344417,4.891211
......1000 columns
File2.csv
neighbourhood LAT,LONG
Bijlmer-Centrum 52.3135175, 4.9547795
Bijlmer-Oost 52.3179787, 4.9754974
Bos en Lommer 52.3807577, 4.8545966
Buitenveldert - Zuidas 52.3382516, 4.872921499999999
Centrum-Oost 51.208107, 4.4249047
Centrum-West 52.0607927, 4.4832451
De Aker - Nieuw Sloten 52.3447535, 4.811520799999999
De Baarsjes - Oud-West 52.367746, 4.854258
De Pijp - Rivierenbuurt 52.3560276, 4.9021384
........500columns
I want to calculate the shortest distance pairs between the two files(maybe arranged in descending order).Also, each pair in File1 should correspond to the closest location in File2 i.e. no entry in File1 should be left out. As an example, consider the first lat-long pair in file1 52.361176,4.899779
, I need the distance of this pair with every other pair in File2 and similarly do this for all other entries in File1. This is the formula that i need to use(it's in python)
def distance(lat1, lon1, lat2, lon2):
p = pi/180
a = 0.5 - cos((lat2-lat1)*p)/2 + cos(lat1*p) * cos(lat2*p) * (1-cos((lon2-lon1)*p))/2
return 12742 * asin(sqrt(a))
I'm new to R, and hence asking experts on this forum to help.
EDIT: File1 and File2 contains more entries than what is mentioned here, this is just a snippet. The original files contain more than 1000 and 500 columns respectively.
Upvotes: 1
Views: 252
Reputation: 27732
here is a spatial join using sf
... data.table::fread()
is used for creating sample data.
#make spatial objects
sf1 <- file1 %>% sf::st_as_sf( coords = c("Long", "Lat"), crs = 4326 )
sf2 <- file2 %>% sf::st_as_sf( coords = c("LONG", "LAT"), crs = 4326 )
st_join( sf1, sf2, join = st_nearest_feature )
#
# Simple feature collection with 15 features and 3 fields
# geometry type: POINT
# dimension: XY
# bbox: xmin: 4.8703 ymin: 52.34061 xmax: 4.919778 ymax: 52.39045
# geographic CRS: WGS 84
# First 10 features:
# Sr neighbourhood geometry
# 1 1 De Pijp - Rivierenbuurt POINT (4.899779 52.36118)
# 2 2 Buitenveldert-Zuidas POINT (4.871195 52.34061)
# 3 3 De Pijp - Rivierenbuurt POINT (4.893847 52.37475)
# 4 4 De Pijp - Rivierenbuurt POINT (4.912281 52.35662)
# 5 5 De Pijp - Rivierenbuurt POINT (4.883685 52.37403)
# 6 6 De Pijp - Rivierenbuurt POINT (4.919778 52.36996)
# 7 7 De Baarsjes - Oud-West POINT (4.8703 52.37089)
# 8 8 De Pijp - Rivierenbuurt POINT (4.915024 52.39045)
# 9 9 De Pijp - Rivierenbuurt POINT (4.900253 52.37858)
# 10 10 De Pijp - Rivierenbuurt POINT (4.896219 52.37837)
sample data used
library(sf)
library(data.table)
file1 <- data.table::fread("
Sr Lat Long
1 52.361176 4.899779
2 52.34061 4.871195
3 52.374749 4.893847
4 52.356624 4.912281
5 52.374026 4.883685
6 52.369956 4.919778
7 52.370895 4.8703
8 52.390454 4.915024
9 52.378576 4.900253
10 52.378372 4.896219
11 52.380056 4.899697
12 52.383744 4.875805
13 52.369981 4.881528
14 52.375954 4.904786
15 52.344417 4.891211")
file2 <- data.table::fread(' neighbourhood LAT LONG
"Bijlmer-Centrum" 52.3135175 4.9547795
"Bijlmer-Oost" 52.3179787 4.9754974
"Bos en Lommer" 52.3807577 4.8545966
"Buitenveldert-Zuidas" 52.3382516 4.872921499999999
"Centrum-Oost" 51.208107 4.4249047
"Centrum-West" 52.0607927 4.4832451
"De Aker - Nieuw Sloten" 52.3447535 4.811520799999999
"De Baarsjes - Oud-West" 52.367746 4.854258
"De Pijp - Rivierenbuurt" 52.3560276 4.9021384')
Upvotes: 3