Reputation: 167
Is it possible to reduce the run time of the following code?
My goal is to get an weighted igraph object from open street data area specified by a box boundary.
Currently im trying to use the overpass api so to offload the memory load so I dont have to keep big osm files in memory.
First I get a osm data specified by a bbox (only streets) as an xml structure
library(osmdata)
library(osmar)
install.packages("remotes")
remotes::install_github("hypertidy/scgraph")
library(scgraph)
dat <- opq(bbox = c(11.68771, 47.75233, 12.35058, 48.19743 )) %>%
add_osm_feature(key = 'highway',value = c("trunk", "trunk_link", "primary","primary_link", "secondary", "secondary_link", "tertiary","tertiary_link", "residential", "unclassified" ))%>%
osmdata_xml ()
Then I convert the resulting xml object dat to an osmar object dat_osmar and finally to an igraph object:
dat_osmar <-as_osmar(xmlParse(dat))
dat_graoh <- as_igraph(dat_osmar)
How could I optimize these routines?
Maybe it is possible to separate dat (XML) object in chunks and parse it in parallell?
I go through several steps only to finally to get to a weighted non directed graph.
Currently the whole process takes 89.555 sec on my machine.
If I could cut the running time of these two stepps:
dat_osmar <-as_osmar(xmlParse(dat))
dat_graoh <- as_igraph(dat_osmar)
that would help already.
One of the approaches I tried is to use osmdata_sc() instead of osmdata_xml().
This provides an silicate object and I can convert it with:
scgraph::sc_as_igraph(dat)
to an igraph.
It is decently fast but sadly the weights are getting lost so its not a solution.
The reason for it is: if I use the conversion from osmar object to an igraph object with the function osmar::as_igraph()
the weight is calculated based on the distances between two edges and added to the igraph:
edges <- lapply(dat, function(x) {
n <- nrow(x)
from <- 1:(n - 1)
to <- 2:n
weights <- distHaversine(x[from, c("lon", "lat")], x[to,
c("lon", "lat")])
cbind(from_node_id = x[from, "ref"], to_node_id = x[to,
"ref"], way_id = x[1, "id"], weights = weights)
})
This is missing from scgraph::sc_as_igraph(dat)
If this could be added to silicate to igraph conversion
I could skip the dat_osmar <-as_osmar(xmlParse(dat))
step
and go overpass->silicate->igraph
route which is much faster istead of overpass->xml->osmar->igraph
.
osmdata package also provides a sf response with osmdata_sf()
so maybe the workflow overpass->sf->igraph
is faster but also while using this way I would need the weights incorporated into the graph based on the distance of edges and im not good enough to do it currently and would realy appreciate any help.
Furthermore the connection between openstreetmap gps points and their IDs should not be lost while using sf and resulting igraph object. Meaning I should be able to find gps position to an ID from the resulting Igraph. A lookup table would be enough. If i go overpass->silicate->igraph
or overpass->xml->osmar->igraph
routes it is possible. Im not sure if it is still would be posible with overpass->sf->igraph
route.
Upvotes: 5
Views: 898
Reputation: 3482
If you want to create a graph object starting from a network of roads in R, then I would use the following procedure.
First of all, I need to install sfnetworks
from the github repo (since we recently fixed some bugs and the newest version is not on CRAN)
remotes::install_github("luukvdmeer/sfnetworks", quiet = TRUE)
Then load packages
library(sf)
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(tidygraph)
#>
#> Attaching package: 'tidygraph'
#> The following object is masked from 'package:stats':
#>
#> filter
library(sfnetworks)
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
Download data from Overpass API
my_osm_data <- opq(bbox = c(11.68771, 47.75233, 12.35058, 48.19743 )) %>%
add_osm_feature(
key = 'highway',
value = c("trunk", "trunk_link", "primary","primary_link", "secondary", "secondary_link", "tertiary","tertiary_link", "residential", "unclassified")
) %>%
osmdata_sf(quiet = FALSE)
#> Issuing query to Overpass API ...
#> Rate limit: 2
#> Query complete!
#> converting OSM data to sf format
Now I extract the roads and build the sfnetwork object:
system.time({
# extract the roads
my_roads <- st_geometry(my_osm_data$osm_lines)
# build the sfnetwork object
my_sfn <- as_sfnetwork(my_roads, directed = FALSE, length_as_weight = TRUE)
})
#> user system elapsed
#> 3.03 0.16 3.28
As you can see, after downloading the OSM data, it takes just a couple of seconds to run that procedure.
At the moment I ignore all fields in my_osm_data$osm_lines
, but if you need to add some of the columns in my_osm_data$osm_lines
to my_roads
, then you can modify the previous code as follows: my_roads <- my_osm_data$osm_lines[, "relevant columns"]
.
A few details regarding the construction of the sfnetwork
object: the parameter "directed = FALSE"
specifies that we want to build an undirected graph (see the docs, here and here for more details), while the parameter length_as_weight = TRUE
says that the length of the edges will be stored in a column called "weight"
and used by igraph and tidygraph algorithms.
This is the printing of my_sfn
object:
my_sfn
#> # A sfnetwork with 33179 nodes and 28439 edges
#> #
#> # CRS: EPSG:4326
#> #
#> # An undirected multigraph with 6312 components with spatially explicit edges
#> #
#> Registered S3 method overwritten by 'cli':
#> method from
#> print.boxx spatstat.geom
#> # Node Data: 33,179 x 1 (active)
#> # Geometry type: POINT
#> # Dimension: XY
#> # Bounding box: xmin: 11.6757 ymin: 47.74745 xmax: 12.39161 ymax: 48.22025
#> x
#> <POINT [°]>
#> 1 (11.68861 47.90971)
#> 2 (11.68454 47.90937)
#> 3 (11.75216 48.17638)
#> 4 (11.75358 48.17438)
#> 5 (11.7528 48.17351)
#> 6 (11.74822 48.17286)
#> # ... with 33,173 more rows
#> #
#> # Edge Data: 28,439 x 4
#> # Geometry type: LINESTRING
#> # Dimension: XY
#> # Bounding box: xmin: 11.6757 ymin: 47.74745 xmax: 12.39161 ymax: 48.22025
#> from to x weight
#> <int> <int> <LINESTRING [°]> <dbl>
#> 1 1 2 (11.68861 47.90971, 11.6878 47.90965, 11.68653 47.90954, 1~ 306.
#> 2 3 4 (11.75216 48.17638, 11.75224 48.17626, 11.75272 48.17556, ~ 246.
#> 3 5 6 (11.7528 48.17351, 11.75264 48.17344, 11.75227 48.17329, 1~ 382.
#> # ... with 28,436 more rows
my_sfn
is an igraph object by definition:
class(my_sfn)
#> [1] "sfnetwork" "tbl_graph" "igraph"
but, if you want to be more explicit, then
as.igraph(my_sfn)
#> IGRAPH 101dcdf U-W- 33179 28439 --
#> + attr: x (v/x), x (e/x), weight (e/n)
#> + edges from 101dcdf:
#> [1] 1-- 2 3-- 4 5-- 6 7-- 8 9-- 10 11-- 12 13-- 14 15-- 16
#> [9] 17-- 18 16-- 19 20-- 21 21-- 22 23-- 24 25-- 26 27-- 28 29-- 30
#> [17] 31-- 32 33-- 34 35-- 36 37-- 38 39-- 40 41-- 42 43-- 44 45-- 46
#> [25] 14-- 47 48-- 49 50-- 51 52-- 53 54-- 55 56-- 57 36-- 58 58-- 59
#> [33] 60-- 61 62-- 63 64-- 65 66-- 67 68-- 69 70-- 71 72-- 73 74-- 75
#> [41] 76-- 77 78-- 79 80-- 81 82-- 83 84-- 85 86-- 87 88-- 89 90-- 91
#> [49] 92-- 93 94-- 95 96-- 97 98-- 99 100--101 102--103 104--105 106--107
#> [57] 108--109 110--111 112--113 80--114 115--116 117--118 119--120 121--122
#> + ... omitted several edges
You can see that the edges have a weight attribute that is equal to the length of each LINESTRING geometry:
all.equal(
target = igraph::edge_attr(as.igraph(my_sfn), "weight"),
current = as.numeric(st_length(my_roads))
)
#> [1] TRUE
Created on 2021-03-26 by the reprex package (v1.0.0)
If you want to read more details regarding sfnetworks
, then you can check the website and the introductory vignettes. That being said, I don't understand what you mean by
connection between openstreetmap gps points and their IDs should not be lost
Can you add more details with a comment or an edit to the original question? Why do you need to OSM id? And what do you mean by OSM id? I think that I need more details to expand this answer.
EDIT
I just re-read @mrhellmann 's answer, and I noticed that I forgot to convert POLYGON data to lines. Anyway, I would suggest applying osmdata::osm_poly2line()
immediately after running the code to download OSM data via Overpass API.
Upvotes: 3
Reputation: 5529
It seems like getting the xml data into another format is taking a long time. Instead of using xml, asking overpass to return an sf
object and using that might be quicker. The sf
object can then be manipulated & used by the sfnetworks
package to get a network, while retaining the spatial aspects of the network. Weights can be added by functions from sfnetworks
(or tidygraph
) packages.
I think the code below focuses on taking care of the speed and the edge weight problems. Other problems, like finding the nearest nodes or edges, can be solved using the functions of the sf
package, but are not addressed. Otherwise this becomes more than a one-off SO question.
The speed might be able to be increased, at the cost of spatial accuracy, by using st_simplify
for the edges. One problem with this approach is that stnetworks places a node where each linestring meets another. The data returned often has a single roadway split into multiple linestrings. As an example, see two longer roads in yellow on the edges plot below. Probably a solvable problem, but may not be worthwhile in this case.
library(osmdata)
#library(osmar)
library(tidyverse)
library(sf)
library(ggplot2)
library(sfnetworks)
library(tidygraph)
# get data as an sf object rather than xml
## This is the slowest part of the code.
dat_sf <- opq(bbox = c(11.68771, 47.75233, 12.35058, 48.19743 )) %>%
add_osm_feature(key = 'highway',value = c("trunk", "trunk_link", "primary","primary_link", "secondary", "secondary_link", "tertiary","tertiary_link", "residential", "unclassified" ))%>%
osmdata_sf()
# Only keep lines & polygons. Points take up too much memory &
## all seem to be on lines anyway. Change polygons to LINESTRING,
## as most seem to be roundabouts or a few odd cases.
lines_sf <- dat_sf$osm_lines %>% select(osm_id) %>% st_sf()
polys_sf <- dat_sf$osm_polygons %>% select(osm_id) %>% st_sf() %>%
st_cast('LINESTRING')
# Combine the two above sf objects into one
dat_sf_bound <- rbind(lines_sf, polys_sf)
# get an sfnetwork
dat_sf_net <- as_sfnetwork(dat_sf_bound)
# add edge weight as distance
dat_sf_net <- dat_sf_net %>%
activate(edges) %>%
mutate(weight = edge_length())
dat_sf_net
should look like:
> dat_sf_net
# An sfnetwork with 33255 nodes and 28608 edges
#
# CRS: EPSG:4326
#
# A directed multigraph with 6391 components with spatially explicit edges
#
# Edge Data: 28,608 x 4 (active)
# Geometry type: LINESTRING
# Dimension: XY
# Bounding box: xmin: 11.6757 ymin: 47.74745 xmax: 12.39161 ymax: 48.22025
from to weight geometry
<int> <int> [m] <LINESTRING [°]>
1 1 2 306.3998 (11.68861 47.90971, 11.6878 47.90965, 11.68653 47.90954, 11.68597 47.909…
2 3 4 245.9225 (11.75216 48.17638, 11.75224 48.17626, 11.75272 48.17556, 11.7528 48.175…
3 5 6 382.2423 (11.7528 48.17351, 11.75264 48.17344, 11.75227 48.17329, 11.752 48.1732,…
4 7 8 131.1373 (11.70029 47.87861, 11.70046 47.87869, 11.70069 47.87879, 11.70138 47.87…
5 9 10 252.9170 (11.75733 48.17339, 11.75732 48.17343, 11.75726 48.17357, 11.75718 48.17…
6 11 12 131.6942 (11.75582 48.17036, 11.75551 48.1707, 11.75521 48.17106, 11.75507 48.171…
# … with 28,602 more rows
#
# Node Data: 33,255 x 1
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 11.6757 ymin: 47.74745 xmax: 12.39161 ymax: 48.22025
geometry
<POINT [°]>
1 (11.68861 47.90971)
2 (11.68454 47.90937)
3 (11.75216 48.17638)
# … with 33,252 more rows
Plots of just edges, and of edges with nodes:
A few long roads skew the coloring, but illustrate the a single road split in two.
Edit:
Addressing the comment to select nearest edge(road) by latitude / longitude coordinates. Nodes(intersections / red dots above) do not have osm id numbers that I am aware of. Nodes are created by sfnetworks
.
Starting with an sf
object of lat/lon point as our made up gps coordinate.
# random point
gps <- sfheaders::sf_point(data.frame(x = 11.81854, y = 48.04514)) %>% st_set_crs(4326)
# nearest edge(road) to the point. dat_sf_net must have edges activated.
near_edge <- st_nearest_feature(gps, dat_sf_net %>% st_as_sf())
>near_edge
[1] 4359
> st_as_sf(dat_sf_net)[near_edge,]
Simple feature collection with 1 feature and 4 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: 11.81119 ymin: 48.02841 xmax: 11.82061 ymax: 48.06845
Geodetic CRS: WGS 84
# A tibble: 1 x 5
from to osm_id geometry weight
<int> <int> <chr> <LINESTRING [°]> [m]
1 7590 7591 24232418 (11.81289 48.02841, 11.81213 48.03014, 11.81202 48.03062, 11.81… 4576.273
p3 <- gplot() +
geom_sf(data = st_as_sf(dat_sf_net), color = 'black') +
geom_sf(data = gps, color = 'red') +
geom_sf(data = st_as_sf(dat_sf_net)[near_edge,], color = 'orange') +
coord_sf(xlim = c(11.7, 11.9), ylim = c(48, 48.1))
It looks like @agila commented above. Since he's an author of sfnetworks
, maybe he will have some insights.
Upvotes: 2