Reputation: 59
https://data.sfgov.org/Transportation/Bike-Share-Stations/gtyg-jpkj
I am working with this dataset, and am wondering if I could convert geometry (Geom in the table) to two columns: longitude and latitude in R.
Thank you!
Upvotes: 4
Views: 14917
Reputation: 23909
Yes it is. Easiest way is probably using the tidyr
package. Here's the one-liner:
library(tidyr)
df <- fread("~/Downloads/Bike_Share_Stations.csv") # Read data
extract(df, Geom, into = c('Lat', 'Lon'), '\\((.*),(.*)\\)', conv = T)
The last argument is a regular expression using group matches. It is a simple pattern: it starts with a literal (
. The most inner two parentheses (.*)
are the two coordinates devided by the comma. Only these are extracted. The pattern ends with the corresponding literal )
.
Here is a subset of the resulting data:
UID Site ID Last Edited Date Lat Lon
1: 1 SF-T24 S1 05/23/2016 12:00:00 AM +0000 37.7518243814 -122.426627114
2: 2 SF-G33 S1 05/23/2016 12:00:00 AM +0000 37.7935049482 -122.392846514
3: 3 SOMA-06A 05/23/2016 12:00:00 AM +0000 37.7897420277 -122.394678441
4: 4 SF-T22 S5 05/23/2016 12:00:00 AM +0000 37.7512809413 -122.431836215
5: 5 SF-R25 S4 05/23/2016 12:00:00 AM +0000 37.7567132725 -122.421038213
---
103: 103 Embr-E 05/23/2016 12:00:00 AM +0000 37.8047749378 -122.403247294
104: 104 SF-N26 S1 05/23/2016 12:00:00 AM +0000 37.7682271629 -122.420291015
105: 105 Market-11B 05/23/2016 12:00:00 AM +0000 37.7922638478 -122.397066071
106: 106 SF-O27 S2 05/23/2016 12:00:00 AM +0000 37.7671609432 -122.415485214
107: 107 SF-T23 S5 05/23/2016 12:00:00 AM +0000 37.7514609421 -122.429135213
Upvotes: 2
Reputation: 43354
RSocrata::read.socrata
and tidyr::extract
make this concise:
library(tidyverse)
df <- RSocrata::read.socrata('https://data.sfgov.org/Transportation/Bike-Share-Stations/gtyg-jpkj')
df <- df %>% extract(Geom, c('lat', 'lon'), '\\((.*), (.*)\\)', convert = TRUE)
# print nicely
df %>% select(UID, Site.ID, lat, lon) %>% as_data_frame()
#> # A tibble: 107 x 4
#> UID Site.ID lat lon
#> * <int> <chr> <dbl> <dbl>
#> 1 1 SF-T24 S1 37.75182 -122.4266
#> 2 2 SF-G33 S1 37.79350 -122.3928
#> 3 3 SOMA-06A 37.78974 -122.3947
#> 4 4 SF-T22 S5 37.75128 -122.4318
#> 5 5 SF-R25 S4 37.75671 -122.4210
#> 6 6 NOMA-2E 37.79861 -122.4008
#> 7 7 SF-L33 S4 37.77590 -122.3932
#> 8 8 SF-O24 S4 37.76623 -122.4269
#> 9 9 Market-03B 37.78099 -122.4117
#> 10 10 SF-O28 S2 37.76723 -122.4108
#> # ... with 97 more rows
Upvotes: 4
Reputation: 148
I think the Geom
column already contains the latitude/longitude.
library(tidyverse)
df <- df %>%
mutate(Geom = gsub('[()°]', '', Geom)) %>%
separate(col = Geom, into = c('Latitude', 'Longitude '), sep = '\\,')
First we remove the parentheses and degree symbol with gsub('[()°]', '', geom)
and replace the Geom
column. We then separate
the Geom
column into new Latitude
and Longitude
columns with a comma separator sep = '\\,'
.
Upvotes: 0