Reputation: 296
I have DF_1
that shows the cities of origin and destination and I want to know how far (miles / km) they are. In DF_2
I have the distances between cities. How do I know the distances with these two DF?
DF_1
:
origin <- c('LONDON','NEW YORK','TOKIO','LONDON','RIO DE JANEIRO')
destination <- c('NEW YORK','NEW YORK','RIO DE JANEIRO','LISBON','MADRID')
DF_1 <- data.frame(origin,destination)
DF_2
:
CITY <- c('NEW YORK', 'LONDON', 'SAN FRANCISCO', 'MADRID', 'LOS ANGELES', 'LISBON', 'RIO DE JANEIRO', 'MOSCOW', 'SAO PAULO', 'TOKIO')
NEW_YORK <- c(0, 700, 250, 1000, 400, 800, 430, 900, 500, 30)
LONDON <- c(700, 0, 350, 1200, 50, 110, 780, 984, 1150, 5)
SAN_FRANCISCO <- c(250, 350, 0, 200, 15, 260, 305, 412, 29, 102)
MADRID <- c(1000, 1200, 200, 0, 77, 115, 225, 318, 412, 511)
LOS_ANGELES <- c(400, 50, 15, 77, 0, 88, 819, 733, 978, 1001)
LISBON <- c(800, 110, 260, 115, 88, 0, 17, 3000, 1418, 735)
RIO_DE_JANEIRO <- c(430, 780, 305, 225, 819, 17, 0, 513, 701, 56)
MOSCOW <- c(900, 984, 412, 318, 733, 3000, 513, 0, 389, 499)
SAO_PAULO <- c(500, 1150, 29, 412, 978, 1418, 701, 389, 0, 1113)
TOKIO <- c(30, 5, 102, 511, 1001, 735, 56, 499, 1113, 0)
DF_2 <- data.frame(CITY, `NEW YORK` = NEW_YORK, LONDON, `SAN FRANCISCO` = SAN_FRANCISCO, MADRID, `LOS ANGELES` = LOS_ANGELES, LISBON, `RIO DE JANEIRO` = RIO_DE_JANEIRO, MOSCOW, `SAO PAULO` = SAO_PAULO, TOKIO, check.names = FALSE)
The result I want is this:
origin <- c('LONDON','NEW YORK','TOKIO','LONDON','RIO DE JANEIRO')
destination <- c('NEW YORK','NEW YORK','RIO DE JANEIRO','LISBON','MADRID')
distance <- c(700,0,56,110,225)
DF_FINAL <- data.frame(origin,destination,distance)
Upvotes: 3
Views: 54
Reputation: 887163
Here is an option with row/column
indexing from base R
i1 <- match(DF_1$origin, DF_2$CITY)
j1 <- match(DF_1$destination, names(DF_2)[-1])
DF_1$distance <- DF_2[-1][cbind(i1, j1)]
DF_1
# origin destination distance
#1 LONDON NEW YORK 700
#2 NEW YORK NEW YORK 0
#3 TOKIO RIO DE JANEIRO 56
#4 LONDON LISBON 110
#5 RIO DE JANEIRO MADRID 225
Upvotes: 1
Reputation: 79238
using base R: you could use:
transform(DF_1,distance = `rownames<-`(DF_2[,-1],DF_2[,1])[as.matrix(DF_1)])
origin destination distance
1 LONDON NEW YORK 700
2 NEW YORK NEW YORK 0
3 TOKIO RIO DE JANEIRO 56
4 LONDON LISBON 110
5 RIO DE JANEIRO MADRID 225
That is. create a new dataframe with the rownames as the city names:
DF_3 <- DF_2[,-1]#Remove the first column
rownames(DF_3) <- DF_2$CITY #change the rownames:
DF_1$DISTANCE <- DF_3[as.matrix(DF_1)]
DF_1
Upvotes: 2
Reputation:
I try doing this stuff in the tidyverse
framework. First step is to turn the matrix of distances into the "long" format. Then, just join that to the original data.frame
!
I suggest adding stringsAsFactors = FALSE
to the end of your data.frame()
definitions to avoid warning messages.
library(tidyr)
library(dplyr)
pivot_longer(DF_2, -CITY) %>%
rename(origin = CITY, destination = name, distance = value) %>%
right_join(DF_1)
# A tibble: 5 x 3
origin destination distance
<chr> <chr> <dbl>
1 LONDON NEW YORK 700
2 NEW YORK NEW YORK 0
3 TOKIO RIO DE JANEIRO 56
4 LONDON LISBON 110
5 RIO DE JANEIRO MADRID 225
Upvotes: 1
Reputation: 370
This should reproduce exactly what you're looking for (using the tidyverse
):
DF_FINAL <- DF_1 %>%
inner_join(DF_2, by = c("origin" = "CITY")) %>%
gather(key = "city", value = "distance", -origin, -destination) %>%
filter(destination == city) %>%
select(-c(city))
DF_FINAL
|origin |destination | distance|
|:--------------|:--------------|--------:|
|LONDON |NEW YORK | 700|
|NEW YORK |NEW YORK | 0|
|RIO DE JANEIRO |MADRID | 225|
|LONDON |LISBON | 110|
|TOKIO |RIO DE JANEIRO | 56|
Upvotes: 1