Reputation: 1817
I have a 2 large dataframes containing various variables, I need to add variable distance_from_capital_city, that would be defined as folows:
One dataframe has all country names and capital cities and their coordinances (cap_coordinances in exaple below) and I have another dataframe that has some variables in same countries sometimes in capital city sometimes not.
I need to add variable distance_from_capital_city to real_data dataframe (in example below) and the result should look like this:
First 4 rows of variable distance_from_capital_city in dataframe real_data should be equal to zero (or some small number because coordinances do not have to match exactly and rounding error etc.) and last fifth row should contain distance from Barcelona to Matrid (grouped_by country). Distance should be measured in kilometrs from capital city, or any euclidian distance or any other suitable measuremeent.
Using for example this funtion:
library(geosphere) distm(c(lon1, lat1), c(lon2, lat2), fun = distHaversine)
I gave example of the result (numbers are for ilustration)
library(tibble)
cap_coordinances =
tribble(
~country_txt, ~city, ~longitude, ~latitude,
"Greece", "Athens", 23.8, 37.9,
"Italy", "Rome", 12.5, 41.9,
"Netherlands", "Amsterdam", 4.90, 52.4,
"Spain", "Madrid", -0.743, 41.0,
)
real_data =
tribble(
~country_txt, ~city, ~longitude, ~latitude,
"Greece", "Athens", 23.762728, 37.99749,
"Italy", "Rome", 12.490069, 41.89096,
"Netherlands", "Amsterdam", 4.90, 52.4,
"Spain", "Madrid", -0.743, 41.0,
"Spain", "Barcelona", 2.15, 41.3
)
result =
tribble(
~country_txt, ~city, ~longitude, ~latitude, ~distance_from_capital_city,
"Greece", "Athens", 23.762728, 37.99749, "0 or small number",
"Italy", "Rome", 12.490069, 41.89096, "0 or small number",
"Netherlands", "Amsterdam", 4.90, 52.4, "0 or small number",
"Spain", "Madrid", -0.743, 41.0, "0 or small number",
"Spain", "Barcelona", 2.15, 41.3, 3500
)
I cannot solve this issue on my own, So I would like to ask for any advice
Data I am using are public can be downloaded here:
Upvotes: 1
Views: 332
Reputation: 887213
We can do a join and then calculate the difference between the corresponding 'latitude', 'longitude' columns
library(dplyr)
library(geosphere)
real_data %>%
left_join(cap_coordinances, by = 'country_txt') %>%
transmute(country_txt, city = city.x,
distance = pmap_dbl(.[c('longitude.x', 'latitude.x',
'longitude.y', 'latitude.y')],
~ distm(c(..1, ..2), c(..3, ..4), fun = distHaversine) %>% as.vector))
# A tibble: 5 x 3
# country_txt city distance
# <chr> <chr> <dbl>
#1 Greece Athens 11335.
#2 Italy Rome 1300.
#3 Netherlands Amsterdam 0
#4 Spain Madrid 0
#5 Spain Barcelona 244775.
Upvotes: 2
Reputation: 2397
Here is how you would do it using sp. The sf solution would be similar using st_distance
and you could use pipes. I just find the coercion to a spatial object more straight forward with sp. Do note that since your data is in decimal degrees distance is based on great circle distance and is in Kilometers.
library(tibble)
library(sp)
cap_coordinances =
tribble(
~country_txt, ~city, ~longitude, ~latitude,
"Greece", "Athens", 23.8, 37.9,
"Italy", "Rome", 12.5, 41.9,
"Netherlands", "Amsterdam", 4.90, 52.4,
"Spain", "Madrid", -0.743, 41.0,
)
real_data =
tribble(
~country_txt, ~city, ~longitude, ~latitude,
"Greece", "Athens", 23.762728, 37.99749,
"Italy", "Rome", 12.490069, 41.89096,
"Netherlands", "Amsterdam", 4.90, 52.4,
"Spain", "Madrid", -0.743, 41.0,
"Spain", "Barcelona", 2.15, 41.3
)
coordinates(cap_coordinances) <- ~longitude+latitude
coordinates(real_data) <- ~longitude+latitude
d <- spDists(real_data, cap_coordinances, longlat = TRUE)
rownames(d) <- real_data$city
colnames(d) <- cap_coordinances$city
print(d)
Upvotes: 0