papelr
papelr

Reputation: 438

Using case_when() to assign two new columns, instead of one

I have this sample data:

df <- tibble(
  "City1" = c("New York", "Boston", "Chicago"),
  "City2" = c("Chicago", "Cleveland", "Atlanta"))

Assume City1 is the starting point, and City2 is the destination. I.e., a person traveled from New York to Chicago.

I want to add a column for the starting latitude and a column for starting longitude, and also do the same for the destination city. In all, I want four new columns. I already have the coordinates.

How can I assign the coordinates? I have tried using case_when, but I am not sure how to deliver coordinates to multiple columns. It is easy to do one column:

library(tidyverse)

# The numbers after the cities are the latitudes
df <- df %>% 
  mutate(
   City1_lat = case_when(
    City1 == 'New York' ~ 40.7128,
    City1 == 'Boston' ~ 42.3601,
    City1 == 'Chicago' ~ 41.8781
  )
 )

How can I expand this to add in a City1_lon column? Trying to streamline this as much as possible, since I have several thousand rows of origins/destinations. Either a dplyr or base solution works. I would expand this for the destination cities, City2. For reference:

New York: 40.7128, 74.0060
Boston: 42.3601, 71.0589
Chicago: 41.8781, 87.6298
Cleveland: 41.4993, 81.6944
Atlanta: 33.7490, 84.3880

Upvotes: 7

Views: 2054

Answers (5)

moodymudskipper
moodymudskipper

Reputation: 47300

Here is a way to do it using mutate_all and unnest, with a bonus hack for naming the columns :

df %>% 
  mutate_all(funs(l = case_when(
      . == 'New York'  ~ list(tibble(at=40.7128, on=74.0060)),
      . == 'Boston'    ~ list(tibble(at=42.3601, on=71.0589)),
      . == 'Chicago'   ~ list(tibble(at=41.8781, on=87.6298)),
      . == 'Cleveland' ~ list(tibble(at=41.4993, on=81.6944)),
      . == 'Atlanta'   ~ list(tibble(at=33.7490, on=84.3880))
    )
  )) %>%
  unnest(.sep = "")

# # A tibble: 3 x 6
#      City1     City2 City1_lat City1_lon City2_lat City2_lon
#      <chr>     <chr>     <dbl>     <dbl>     <dbl>     <dbl>
# 1 New York   Chicago   40.7128   74.0060   41.8781   87.6298
# 2   Boston Cleveland   42.3601   71.0589   41.4993   81.6944
# 3  Chicago   Atlanta   41.8781   87.6298   33.7490   84.3880

This addresses "Using case_when() to assign two new columns".

To solve the general problem I would recommend a solution based on left joins, as it is more flexible to have your keys and values in a neat separate table.

Upvotes: 3

Diego Rojas
Diego Rojas

Reputation: 199

You should call externally a file (in my example, called as data_xy) with info with "city, lat and long", then you can use left_join. Try this code:

library(dplyr)
library(purrr)
data_xy <- tibble(city = c("New York", "Boston", "Chicago", "Cleveland", "Atlanta"),
                  lat = c(40.7128, 42.3601, 41.8781, 41.4993, 33.7490),
                  lon = c(74.0060, 71.0589, 87.6298, 81.6944, 84.3880))


df <- tibble("City1" = c("New York", "Boston", "Chicago"),
             "City2" = c("Chicago", "Cleveland", "Atlanta"))

df_latlon <- map(names(df), ~ left_join(df %>% select(.x),  data_xy, 
                                        by= structure(names = .x, .Data = "city")) )
df_latlon

The output:

> df_latlon
[[1]]
# A tibble: 3 x 3
  City1      lat   lon
  <chr>    <dbl> <dbl>
1 New York  40.7  74.0
2 Boston    42.4  71.1
3 Chicago   41.9  87.6

[[2]]
# A tibble: 3 x 3
  City2       lat   lon
  <chr>     <dbl> <dbl>
1 Chicago    41.9  87.6
2 Cleveland  41.5  81.7
3 Atlanta    33.7  84.4

Upvotes: 1

akrun
akrun

Reputation: 887038

One option is to do a left_join after creating a 'keyval' dataset

library(tidyverse)
map_dfc(names(df), ~  df %>% 
                        select(.x) %>% 
                        left_join(keyval, by = setNames('City', .x))) %>%
    select(names(df), everything())  
# A tibble: 3 x 6
#  City1    City2       lat   lon  lat1  lon1
#  <chr>    <chr>     <dbl> <dbl> <dbl> <dbl>
#1 New York Chicago    40.7  74.0  41.9  87.6
#2 Boston   Cleveland  42.4  71.1  41.5  81.7
#3 Chicago  Atlanta    41.9  87.6  33.7  84.4

If there are more columns in the original data and we are only interested in the 'City' columns, then only loop through the 'City' columns

df$journeys <- (100,200,300)
nm1 <- grep("City", names(df), value = TRUE)
map_dfc(nm1, ~  df %>% 
                     select(.x) %>% 
                     left_join(keyval, by = setNames('City', .x))) %>%  
      bind_cols(df %>% 
                  select(-one_of(nm1)))

data

keyval <- structure(list(City = c("New York", "Boston", "Chicago", "Cleveland", 
 "Atlanta"), lat = c(40.7128, 42.3601, 41.8781, 41.4993, 33.749
 ), lon = c(74.0068, 71.0589, 87.6298, 81.6944, 84.388)), row.names = c(NA, 
  -5L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 3

thc
thc

Reputation: 9705

Here's a tidyverse solution:

library(dplyr)
library(purrr)

df <- tibble(
  "City1" = c("New York", "Boston", "Chicago"),
  "City2" = c("Chicago", "Cleveland", "Atlanta"))


df <- df %>% 
  mutate(
    City1_coords = case_when(
      City1 == 'New York' ~ list(c(40.7128,74.0060)),
      City1 == 'Boston' ~ list(c(42.3601,71.0589)),
      City1 == 'Chicago' ~ list(c(41.8781,87.6298))
    )
  ) %>% 
  mutate(City1_lat = City1_coords %>% map_dbl(~ .x[1] ),
         City1_lon = City1_coords %>% map_dbl(~ .x[2] ))

Upvotes: 2

Spacedman
Spacedman

Reputation: 94182

With your city data in a dataframe like this:

> city
       City     lat    long
1  New York 40.7128 74.0060
2    Boston 42.3601 71.0589
3   Chicago 41.8781 87.6298
4 Cleveland 41.4993 81.6944
5   Atlanta 33.7490 84.3880

Use match to lookup city names across the tables, extract the lat-long, and renaming gives this:

> setNames(city[match(df$City1, city$City), c("lat","long")],c("City1lat","City1long"))
  City1lat City1long
1  40.7128   74.0060
2  42.3601   71.0589
3  41.8781   87.6298

> setNames(city[match(df$City2, city$City), c("lat","long")],c("City2lat","City2long"))
  City2lat City2long
3  41.8781   87.6298
4  41.4993   81.6944
5  33.7490   84.3880

which you can cbind onto your original data:

> df = cbind(df, setNames(city[match(df$City1, city$City), c("lat","long")],c("City1lat","City1long")), setNames(city[match(df$City2, city$City), c("lat","long")],c("City2lat","City2long")))
> df
     City1     City2 City1lat City1long City2lat City2long
1 New York   Chicago  40.7128   74.0060  41.8781   87.6298
2   Boston Cleveland  42.3601   71.0589  41.4993   81.6944
3  Chicago   Atlanta  41.8781   87.6298  33.7490   84.3880

Upvotes: 3

Related Questions