DotPi
DotPi

Reputation: 4127

google geocoding and haversine distance calculation in R

I am using the geocode function from the ggmap package to geocode country names and then passing them onto the distHaversine in the geosphere library to calculate the distance between two countries.

Sample of my data is as follows:

              Country.Value                   Address.Country

 1:           United States                   United States 
 2:                  Cyprus                   United States 
 3:               Indonesia                   United States 
 4:                Tanzania                        Tanzania 
 5:              Madagascar                   United States 
 6:                  Belize                          Canada 
 7:               Argentina                       Argentina 
 8:                   Egypt                           Egypt 
 9:            South Africa                    South Africa 
10:                Paraguay                        Paraguay

I have also used if-else statements to try and stay within the geocoding limits set by the free Google Maps geocoder. My code is as follows:

for(i in 1:nrow(df)) {
  row<-df.cont.long[i,]

  src_lon<- 0.0
  src_lat<- 0.0
  trgt_lon<- 0.0
  trgt_lat<- 0.0  


  if((row$Country.Value=='United States')){  #Reduce geocoding requirements
    trgt_lon<- -95.7129
    trgt_lat<- 37.0902
  }
  else if((row$Address.Country=='United States')){  #Reduce Geocoding Requirements
    src_lon<- -95.7129
    src_lat<- 37.0902
  }
  else if((row$Country.Value=='Canada')){  #Reduce geocoding requirements
    trgt_lon<- -106.3468
    trgt_lat<- 56.1304
  }
  else if((row$Primary.Address.Country=='Canada')){  #Reduce Geocoding Requirements
    src_lon<- -106.3468
    src_lat<- 56.1304
  }
  else if(row$Country.Value == row$Address.Country){   #Reduce Geocoding Requirements
    # trgt<-geocode(row$Country.Value)
    # trgt_lon<-as.numeric(trgt$lon)
    # trgt_lat<-as.numeric(trgt$lat)
    # src_lon<-as.numeric(trgt$lon)
    # src_lat<-as.numeric(trgt$lat)
  }
  else{
    trgt<-geocode(row$Country.Value, output=c("latlon"))
    trgt_lon<-as.numeric(trgt$lon)
    trgt_lat<-as.numeric(trgt$lat)

    src<-geocode(row$Address.Country)
    src_lon<-as.numeric(src$lon)
    src_lat<-as.numeric(src$lat)

  }

  print(i)
  print(c(row$Address.Country, src_lon, src_lat))
  print(c(row$Country.Value, trgt_lon, trgt_lat))

  print(distHaversine( p1=c(as.numeric(src$lon), as.numeric(src$lat)), p2=c(as.numeric(trgt$lon), as.numeric(trgt$lat)) ))


}

In the output

  1. Sometimes geocoding is done, sometimes not, and is defaulting to 0.0
  2. Sometimes distance is getting calculated, sometimes not

I have no idea where the code is going wrong.

Moreover, uncommenting the lines where I check if the Country.Value and Address.Country are equal, makes things even worse.

Upvotes: 0

Views: 258

Answers (1)

alistaire
alistaire

Reputation: 43344

The functions you're using are vectorized, so all you really need is

library(ggmap)
library(geosphere)

distHaversine(geocode(as.character(df$Country.Value)), 
              geocode(as.character(df$Address.Country)))
# [1]        0 10432624 14978567        0 15868544  4588708        0        0        0        0

Note the as.characters are there because ggmap::geocode doesn't like factors. The results make sense:

df$distance <- distHaversine(geocode(as.character(df$Country.Value), source = 'dsk'), 
                             geocode(as.character(df$Address.Country), source = 'dsk'))

df
#    Country.Value Address.Country distance
# 1  United States   United States        0
# 2         Cyprus   United States 10340427
# 3      Indonesia   United States 14574480
# 4       Tanzania        Tanzania        0
# 5     Madagascar   United States 16085178
# 6         Belize          Canada  5172279
# 7      Argentina       Argentina        0
# 8          Egypt           Egypt        0
# 9   South Africa    South Africa        0
# 10      Paraguay        Paraguay        0

Edit

If you don't want to use ggmap::geocode, tmap::geocode_OSM is another geocoding function that uses OpenStreetMap data. However, because it is not vectorized, you need to iterate over it columnwise:

distHaversine(t(sapply(df$Country.Value, function(x){tmap::geocode_OSM(x)$coords})), 
              t(sapply(df$Address.Country, function(x){tmap::geocode_OSM(x)$coords})))
# [1]        0 10448111 14794618        0 16110917  5156823        0        0        0        0

or rowwise:

apply(df, 1, function(x){distHaversine(tmap::geocode_OSM(x['Country.Value'])$coords, 
                                       tmap::geocode_OSM(x['Address.Country'])$coords)})
# [1]        0 10448111 14794618        0 16110917  5156823        0        0        0        0

and subset to the coords data. Also note that Google, DSK, and OSM all choose different centers for each country, so the resulting distances are differ by some distance.

Upvotes: 2

Related Questions