J.Sabree
J.Sabree

Reputation: 2526

make a new column for each row that meets a criteria R

I have two datasets. The first one has someone's location and their distance in miles to different destinations. The second dataset has the list of all the destinations. I want R to make a column that pulls the name of each destination that is under 1000 miles away.

Here's a sample of the first dataset:

library(tidyverse)
start_location <- tibble(location = c("Anhui China", "Amersfoort Utrecht Netherlands", "Akita Akita Japan"),
lon = c(117.92, 5.38, 140.1),
lat = c(30.60, 52.16, 39.71),
dist_beijing = c(658, 5686, 1250),
dist_shanghai = c(241, 5510, 1200),
dist_tokyo = c(1300, 5775, 280),
dist_prague = c(5173, 417, 5415), 
dist_pomezia = c(5555, 474, 5927),
dist_antwerp = c(5498, 77, 5612))

Here's the second dataset

library(tidyverse)
destinations <- tibble(destinations = c("beijing china", "shanghai china", "tokyo japan", "prague czech republic", "pomezia italy", "antwerp belgium"),
lon = c(116.4, 121.47, 139.65, 14.43, 12.50, 4.40),
lat = c(39.90, 31.23, 35.67, 50.07, 41.67, 51.22))

And here's what I want the dataset to look like:

library(tidyverse)
solution <- tibble(location = c("Anhui China", "Amersfoort Utrecht Netherlands", "Akita Akita Japan"),
lon = c(117.92, 5.38, 140.1),
lat = c(30.60, 52.16, 39.71),
nearest1 = c("shanghai china", "antwerp belgium", "tokyo japan"),
nearest2 = c("beijing china", "prague czech republic", NA),
nearest3 = c(NA, "pomezia italy", NA))

I know how to make it find the shortest distance, but I'm struggling on making it yield a column name for each one. Also, though this one had three nearest, I don't necessarily want to limit it to only 3. I just want it to make columns for each destination that is under 1000 miles.

I assume I should use case_when and pmap, but I can't figure out how to add the if statement and allow it to make multiple columns.

If it it can't make columns easily, I can also have it make one column that lists all the the destinations under 1000 miles in order (for instance, if "beijing china, shanghai china"), because then I could at least separate it out with tidyr.

Also, if possible, I'd like a tidy solution, if possible.

Thank you!!

Upvotes: 0

Views: 47

Answers (1)

iod
iod

Reputation: 7592

Here's a tidyverse solution:

result<-start_location %>% gather("destination","distance",-(1:3)) %>%
  filter(distance<=1000) %>% 
  group_by(location) %>% 
  arrange(distance) %>% 
  mutate(id=paste0("nearest",row_number())) %>% 
  select(-5) 
result$destination<-gsub("dist_","",result$destination)
result$destination<-sapply(result$destination, function(x) grep(x,destinations$destinations,value=TRUE))
result<-result %>% spread(id, destination)

# A tibble: 3 x 6
# Groups:   location [3]
  location                     lon   lat nearest1       nearest2          nearest3   
  <chr>                      <dbl> <dbl> <chr>          <chr>             <chr>      
1 Akita Akita Japan         140.    39.7 tokyo japan    NA                NA         
2 Amersfoort Utrecht Nethe~   5.38  52.2 antwerp belgi~ prague czech rep~ pomezia it~
3 Anhui China               118.    30.6 shanghai china beijing china     NA 

The key is in arranging the destinations (which have been grouped by starting location) by distance, and then assigning an id tag according to their order -- then you can spread the destinations into columns based on these id tags.

I added a couple of steps before the spread to replace the names of destination columns with the actual names of the destinations from the destinations dataframe -- that may introduce some errors if you have a destination city that's also a country's name (e.g., Mexico City) and that country also appears in another destination, so just keep that in mind.

Upvotes: 1

Related Questions