csha
csha

Reputation: 59

Converting Geometry to Longitude/Latitude coordinates in R

https://data.sfgov.org/Transportation/Bike-Share-Stations/gtyg-jpkj

I am working with this dataset, and am wondering if I could convert geometry (Geom in the table) to two columns: longitude and latitude in R.

Thank you!

Upvotes: 4

Views: 14917

Answers (3)

Martin Schmelzer
Martin Schmelzer

Reputation: 23909

Yes it is. Easiest way is probably using the tidyr package. Here's the one-liner:

library(tidyr)
df <- fread("~/Downloads/Bike_Share_Stations.csv") # Read data

extract(df, Geom, into = c('Lat', 'Lon'), '\\((.*),(.*)\\)', conv = T)

The last argument is a regular expression using group matches. It is a simple pattern: it starts with a literal (. The most inner two parentheses (.*) are the two coordinates devided by the comma. Only these are extracted. The pattern ends with the corresponding literal ).

Here is a subset of the resulting data:

     UID    Site ID             Last Edited Date           Lat             Lon
  1:   1  SF-T24 S1 05/23/2016 12:00:00 AM +0000 37.7518243814  -122.426627114
  2:   2  SF-G33 S1 05/23/2016 12:00:00 AM +0000 37.7935049482  -122.392846514
  3:   3   SOMA-06A 05/23/2016 12:00:00 AM +0000 37.7897420277  -122.394678441
  4:   4  SF-T22 S5 05/23/2016 12:00:00 AM +0000 37.7512809413  -122.431836215
  5:   5  SF-R25 S4 05/23/2016 12:00:00 AM +0000 37.7567132725  -122.421038213
 ---                                                                          
103: 103     Embr-E 05/23/2016 12:00:00 AM +0000 37.8047749378  -122.403247294
104: 104  SF-N26 S1 05/23/2016 12:00:00 AM +0000 37.7682271629  -122.420291015
105: 105 Market-11B 05/23/2016 12:00:00 AM +0000 37.7922638478  -122.397066071
106: 106  SF-O27 S2 05/23/2016 12:00:00 AM +0000 37.7671609432  -122.415485214
107: 107  SF-T23 S5 05/23/2016 12:00:00 AM +0000 37.7514609421  -122.429135213

Upvotes: 2

alistaire
alistaire

Reputation: 43354

RSocrata::read.socrata and tidyr::extract make this concise:

library(tidyverse)

df <- RSocrata::read.socrata('https://data.sfgov.org/Transportation/Bike-Share-Stations/gtyg-jpkj')

df <- df %>% extract(Geom, c('lat', 'lon'), '\\((.*), (.*)\\)', convert = TRUE) 

# print nicely
df %>% select(UID, Site.ID, lat, lon) %>% as_data_frame()
#> # A tibble: 107 x 4
#>      UID    Site.ID      lat       lon
#>  * <int>      <chr>    <dbl>     <dbl>
#>  1     1  SF-T24 S1 37.75182 -122.4266
#>  2     2  SF-G33 S1 37.79350 -122.3928
#>  3     3   SOMA-06A 37.78974 -122.3947
#>  4     4  SF-T22 S5 37.75128 -122.4318
#>  5     5  SF-R25 S4 37.75671 -122.4210
#>  6     6    NOMA-2E 37.79861 -122.4008
#>  7     7  SF-L33 S4 37.77590 -122.3932
#>  8     8  SF-O24 S4 37.76623 -122.4269
#>  9     9 Market-03B 37.78099 -122.4117
#> 10    10  SF-O28 S2 37.76723 -122.4108
#> # ... with 97 more rows

Upvotes: 4

Highland
Highland

Reputation: 148

I think the Geom column already contains the latitude/longitude.

library(tidyverse)

df <- df %>% 
  mutate(Geom = gsub('[()°]', '', Geom)) %>% 
  separate(col = Geom, into = c('Latitude', 'Longitude '), sep = '\\,')

First we remove the parentheses and degree symbol with gsub('[()°]', '', geom) and replace the Geom column. We then separate the Geom column into new Latitude and Longitude columns with a comma separator sep = '\\,'.

Upvotes: 0

Related Questions