Reputation: 2526
I have two datasets. The first one has someone's location and their distance in miles to different destinations. The second dataset has the list of all the destinations. I want R to make a column that pulls the name of each destination that is under 1000 miles away.
Here's a sample of the first dataset:
library(tidyverse)
start_location <- tibble(location = c("Anhui China", "Amersfoort Utrecht Netherlands", "Akita Akita Japan"),
lon = c(117.92, 5.38, 140.1),
lat = c(30.60, 52.16, 39.71),
dist_beijing = c(658, 5686, 1250),
dist_shanghai = c(241, 5510, 1200),
dist_tokyo = c(1300, 5775, 280),
dist_prague = c(5173, 417, 5415),
dist_pomezia = c(5555, 474, 5927),
dist_antwerp = c(5498, 77, 5612))
Here's the second dataset
library(tidyverse)
destinations <- tibble(destinations = c("beijing china", "shanghai china", "tokyo japan", "prague czech republic", "pomezia italy", "antwerp belgium"),
lon = c(116.4, 121.47, 139.65, 14.43, 12.50, 4.40),
lat = c(39.90, 31.23, 35.67, 50.07, 41.67, 51.22))
And here's what I want the dataset to look like:
library(tidyverse)
solution <- tibble(location = c("Anhui China", "Amersfoort Utrecht Netherlands", "Akita Akita Japan"),
lon = c(117.92, 5.38, 140.1),
lat = c(30.60, 52.16, 39.71),
nearest1 = c("shanghai china", "antwerp belgium", "tokyo japan"),
nearest2 = c("beijing china", "prague czech republic", NA),
nearest3 = c(NA, "pomezia italy", NA))
I know how to make it find the shortest distance, but I'm struggling on making it yield a column name for each one. Also, though this one had three nearest, I don't necessarily want to limit it to only 3. I just want it to make columns for each destination that is under 1000 miles.
I assume I should use case_when and pmap, but I can't figure out how to add the if statement and allow it to make multiple columns.
If it it can't make columns easily, I can also have it make one column that lists all the the destinations under 1000 miles in order (for instance, if "beijing china, shanghai china"), because then I could at least separate it out with tidyr.
Also, if possible, I'd like a tidy solution, if possible.
Thank you!!
Upvotes: 0
Views: 47
Reputation: 7592
Here's a tidyverse solution:
result<-start_location %>% gather("destination","distance",-(1:3)) %>%
filter(distance<=1000) %>%
group_by(location) %>%
arrange(distance) %>%
mutate(id=paste0("nearest",row_number())) %>%
select(-5)
result$destination<-gsub("dist_","",result$destination)
result$destination<-sapply(result$destination, function(x) grep(x,destinations$destinations,value=TRUE))
result<-result %>% spread(id, destination)
# A tibble: 3 x 6
# Groups: location [3]
location lon lat nearest1 nearest2 nearest3
<chr> <dbl> <dbl> <chr> <chr> <chr>
1 Akita Akita Japan 140. 39.7 tokyo japan NA NA
2 Amersfoort Utrecht Nethe~ 5.38 52.2 antwerp belgi~ prague czech rep~ pomezia it~
3 Anhui China 118. 30.6 shanghai china beijing china NA
The key is in arranging the destinations (which have been grouped by starting location) by distance, and then assigning an id
tag according to their order -- then you can spread
the destinations into columns based on these id
tags.
I added a couple of steps before the spread
to replace the names of destination columns with the actual names of the destinations from the destinations
dataframe -- that may introduce some errors if you have a destination city that's also a country's name (e.g., Mexico City) and that country also appears in another destination, so just keep that in mind.
Upvotes: 1