Reputation: 181

Use NetworkX to find cycles in MultiDiGraph imported from shapefile

I am writing a QGIS plugin which will use the NetworkX library to manipulate and analyze stream networks. My data comes from shapefiles representing stream networks.

(arrows represent direction of stream flow)

Within this stream network are braids which are important features I need to retain. I am categorizing braid features into "simple" (two edges that share two nodes) and "complex" (more than two edges, with more than two nodes).

Simple braid example

Complex braid example

Normally, I would just use the NetworkX built-in function read_shp to import the shapefile as a DiGraph. As is evident in the examples, the "simple" braid will be considered a parallel edge in a NetworkX DiGraph, because those two edges (which share the same to and from nodes) would be collapsed into a single edge. In order to preserve these multiple edges, we wrote a function that imports a shapefile as a MultiDiGraph. Simple braids (i.e. parallel edges) are preserved by using unique keys in the edge objects (this is embedded in a class):

def _shp_to_nx(self, in_network_lyr, simplify=True, geom_attrs=True):
    """
    This is a re-purposed version of read_shp from the NetworkX library.
    :param shapelayer:
    :param simplify:
    :param geom_attrs:
    :return:
    """
    self.G = nx.MultiDiGraph()


    for f in in_network_lyr.getFeatures():

        flddata = f.attributes()
        fields = [str(fi.name()) for fi in f.fields()]

        geo = f.geometry()
        # We don't care about M or Z
        geo.geometry().dropMValue()
        geo.geometry().dropZValue()

        attributes = dict(zip(fields, flddata))
        # Add a new _FID_ field
        fid = int(f.id())
        attributes[self.id_field] = fid
        attributes['_calc_len_'] = geo.length()

        # Note:  Using layer level geometry type
        if geo.wkbType() in (QgsWKBTypes.LineString, QgsWKBTypes.MultiLineString):
            for edge in self.edges_from_line(geo, attributes, simplify, geom_attrs):
                e1, e2, attr = edge
                self.features[fid] = attr
                self.G.add_edge(tuple(e1), tuple(e2), key=attr[self.id_field], attr_dict=attr)
            self.cols = self.features[self.features.keys()[0]].keys()
        else:
            raise ImportError("GeometryType {} not supported. For now we only support LineString types.".
                              format(QgsWKBTypes.displayString(int(geo.wkbType()))))

I have already written a function to find the "simple" braid features (I just iterate through the MultiDiGraphs nodes, and find edges with more than one key). But I also need to find the "complex" braids. Normally, in a Graph, I could use the cycle_basis to find all of the "complex" braids (i.e. cycles), however, the cycle_basis method only works on un-directed Graphs, not directional graphs. But I'd rather not convert my MultiDiGraph into an un-directed Graph, as there can be unexpected results associated with that conversion (not to mention losing my edge key values).

How could I go about finding cycles which are made up of more than one edge, in a relatively time-efficient way? The stream networks I'm really working with can be quite large and complex, representing large watersheds.

Thanks!

Upvotes: 1

Answers (2)

jesselangdon

Reputation: 181

So I came up with a solution, for finding both "simple" and "complex" braids.

def get_complex_braids(self, G, attrb_field, attrb_name):
    """
    Create graph with the braid edges attributed
    :param attrb_field: name of the attribute field
    :return braid_G: graph with new attribute
    """
    if nx.is_directed(G):
        UG = nx.Graph(G)
        braid_G = nx.MultiDiGraph()
        for edge in G.edges(data=True, keys=True):
            is_edge = self.get_edge_in_cycle(edge, UG)
            if is_edge == True:
                braid_G.add_edge(*edge)
        self.update_attribute(braid_G, attrb_field, attrb_name)
        return braid_G
    else:
        print "ERROR: Graph is not directed."
        braid_complex_G = nx.null_graph()
        return braid_complex_G

def get_simple_braids(self, G, attrb_field, attrb_name):
    """
    Create graph with the simple braid edges attributed
    :param attrb_field: name of the attribute field
    :return braid_G: graph with new attribute
    """
    braid_simple_G = nx.MultiDiGraph()
    parallel_edges = []
    for e in G.edges_iter():
        keys = G.get_edge_data(*e).keys()
        if keys not in parallel_edges:
            if len(keys) == 2:
                for k in keys:
                    data = G.get_edge_data(*e, key=k)
                    braid_simple_G.add_edge(e[0], e[1], key=k, attr_dict=data)
        parallel_edges.append(keys)
    self.update_attribute(braid_simple_G, attrb_field, attrb_name)
    return braid_simple_G

Upvotes: 1

Alz

Reputation: 805

This is not a definite answer, but longer than maximum allowed characters for a comment, so I post it here anyway.

To find simple braids, you can use built-in methods G.selfloop_edges and G.nodes_with_selfloops.

I haven't heard about cycle_basis for directed graphs, can you provide a reference (e.g. scientific work)? NetworkX has simple_cycles(G) which works on directed Graphs, but it is also not useful in this case, because water does not visit any node twice (or?). I am afraid that the only way is to precisely describe the topology and then search the graph to find matching occurrences. let me clarify my point with an example. the following function should be able to identify instances of complex braids similar to your example:

def Complex_braid(G):
    res = []
    # find all nodes with out_degree greater than one:
    candidates = [n for n in G.nodes() if len(G.successors(n)) > 1]
    # find successors:
    for n in candidates:
        succ = G.successors(n)
        for s in succ:
            if len(list(nx.all_simple_paths(G,n,s))) > 1:
                all_nodes = sorted(list(nx.all_simple_paths(G,n,s)), key=len)[-1]
                res.append(all_nodes)

    return res

G = nx.MultiDiGraph()
G.add_edges_from([(0,1), (1,2), (2,3), (4,5), (1,5), (5,2)])
Complex_braid(G)
# out: [[1, 5, 2]]

but the problem actually is that complex braids can be in different topological configurations and therefore it doesn't really make sense to define all possible topological configurations, unless you can describe them with one (or few) patterns or you can find a condition that signify the presence of complex braid.

Upvotes: 0

Use NetworkX to find cycles in MultiDiGraph imported from shapefile

Answers (2)

Related Questions