Reputation: 4127
I am using the geocode
function from the ggmap
package to geocode country names and then passing them onto the distHaversine
in the geosphere
library to calculate the distance between two countries.
Sample of my data is as follows:
Country.Value Address.Country
1: United States United States
2: Cyprus United States
3: Indonesia United States
4: Tanzania Tanzania
5: Madagascar United States
6: Belize Canada
7: Argentina Argentina
8: Egypt Egypt
9: South Africa South Africa
10: Paraguay Paraguay
I have also used if-else statements to try and stay within the geocoding limits set by the free Google Maps geocoder. My code is as follows:
for(i in 1:nrow(df)) {
row<-df.cont.long[i,]
src_lon<- 0.0
src_lat<- 0.0
trgt_lon<- 0.0
trgt_lat<- 0.0
if((row$Country.Value=='United States')){ #Reduce geocoding requirements
trgt_lon<- -95.7129
trgt_lat<- 37.0902
}
else if((row$Address.Country=='United States')){ #Reduce Geocoding Requirements
src_lon<- -95.7129
src_lat<- 37.0902
}
else if((row$Country.Value=='Canada')){ #Reduce geocoding requirements
trgt_lon<- -106.3468
trgt_lat<- 56.1304
}
else if((row$Primary.Address.Country=='Canada')){ #Reduce Geocoding Requirements
src_lon<- -106.3468
src_lat<- 56.1304
}
else if(row$Country.Value == row$Address.Country){ #Reduce Geocoding Requirements
# trgt<-geocode(row$Country.Value)
# trgt_lon<-as.numeric(trgt$lon)
# trgt_lat<-as.numeric(trgt$lat)
# src_lon<-as.numeric(trgt$lon)
# src_lat<-as.numeric(trgt$lat)
}
else{
trgt<-geocode(row$Country.Value, output=c("latlon"))
trgt_lon<-as.numeric(trgt$lon)
trgt_lat<-as.numeric(trgt$lat)
src<-geocode(row$Address.Country)
src_lon<-as.numeric(src$lon)
src_lat<-as.numeric(src$lat)
}
print(i)
print(c(row$Address.Country, src_lon, src_lat))
print(c(row$Country.Value, trgt_lon, trgt_lat))
print(distHaversine( p1=c(as.numeric(src$lon), as.numeric(src$lat)), p2=c(as.numeric(trgt$lon), as.numeric(trgt$lat)) ))
}
In the output
I have no idea where the code is going wrong.
Moreover, uncommenting the lines where I check if the Country.Value and Address.Country are equal, makes things even worse.
Upvotes: 0
Views: 258
Reputation: 43344
The functions you're using are vectorized, so all you really need is
library(ggmap)
library(geosphere)
distHaversine(geocode(as.character(df$Country.Value)),
geocode(as.character(df$Address.Country)))
# [1] 0 10432624 14978567 0 15868544 4588708 0 0 0 0
Note the as.character
s are there because ggmap::geocode
doesn't like factors. The results make sense:
df$distance <- distHaversine(geocode(as.character(df$Country.Value), source = 'dsk'),
geocode(as.character(df$Address.Country), source = 'dsk'))
df
# Country.Value Address.Country distance
# 1 United States United States 0
# 2 Cyprus United States 10340427
# 3 Indonesia United States 14574480
# 4 Tanzania Tanzania 0
# 5 Madagascar United States 16085178
# 6 Belize Canada 5172279
# 7 Argentina Argentina 0
# 8 Egypt Egypt 0
# 9 South Africa South Africa 0
# 10 Paraguay Paraguay 0
If you don't want to use ggmap::geocode
, tmap::geocode_OSM
is another geocoding function that uses OpenStreetMap data. However, because it is not vectorized, you need to iterate over it columnwise:
distHaversine(t(sapply(df$Country.Value, function(x){tmap::geocode_OSM(x)$coords})),
t(sapply(df$Address.Country, function(x){tmap::geocode_OSM(x)$coords})))
# [1] 0 10448111 14794618 0 16110917 5156823 0 0 0 0
or rowwise:
apply(df, 1, function(x){distHaversine(tmap::geocode_OSM(x['Country.Value'])$coords,
tmap::geocode_OSM(x['Address.Country'])$coords)})
# [1] 0 10448111 14794618 0 16110917 5156823 0 0 0 0
and subset to the coords
data. Also note that Google, DSK, and OSM all choose different centers for each country, so the resulting distances are differ by some distance.
Upvotes: 2