Parsing graph data file with Python

Question

I have one relatively small issue, but I can't keep to wrap my head around it. I have a text file which has information about a graph, and the structure is as follows:

first line contains the number of nodes
a blank line is used for separation
information about nodes follows, each chunk is separated from another by the empty line
chunks contain the node id one one line, type on second, and information about edges follows
there are two types of edges, up and down, and first number after node types denotes number of "up" edges, and their IDs follow in line after (if that number is 0, no "up" edges exist and the next number denotes the "down" edges)
same goes for the "down" edges, number of them and their ids in line below

So, sample data with two nodes is:

So, node 1 has type 1, two up edges, 2 and 3, and no down edges. Node 2 has type 1, zero up edges, and 2 down edges, 1 and 3 Node 3 has type 2, one up edge, 1, and 1 down edge, 2.

This info is clearly readable by human, but I am having issues writing a parser to take this information and store it in usable form.

I have written a sample code:

f = open('C:\data', 'r')
lines = f.readlines()
num_of_nodes = lines[0]
nodes = {}
counter = 0
skip_next = False
for line in lines[1:]:
    new = False
    left = False
    right = False
    if line == "
":
        counter += 1
        nodes[counter] = []
        new = True
        continue
    nodes[counter].append(line.replace("
", ""))

Which kinda gets me the info split for each node. I would like something like a dictionary, which would hold the ID, up and down neighbors for each (or False if there are none available). I suppose that I could now parse through this list of nodes again and do each on its own, but I am wondering can I modify this loop I have to do that nicely in the first place.

bruno desthuilliers · Accepted Answer

Is that what you want ?

{1: {'downs': [], 'ups': [2, 3], 'node_type': 1}, 
 2: {'downs': [1, 3], 'ups': [], 'node_type': 1}, 
 3: {'downs': [2], 'ups': [1], 'node_type': 2}}

Then here's the code:

def parse_chunk(chunk):
    node_id = int(chunk[0])
    node_type = int(chunk[1])

    nb_up = int(chunk[2])
    if nb_up:
        ups = map(int, chunk[3].split())
        next_pos = 4
    else:
        ups = []
        next_pos = 3

    nb_down = int(chunk[next_pos])
    if nb_down:
        downs = map(int, chunk[next_pos+1].split())
    else:
        downs = []

    return node_id, dict(
        node_type=node_type,
        ups=ups,
        downs=downs
        )

def collect_chunks(lines):
    chunk = []
    for line in lines:
        line = line.strip()
        if line:
            chunk.append(line)
        else:
            yield chunk
            chunk = []
    if chunk:
        yield chunk

def parse(stream):
    nb_nodes = int(stream.next().strip())
    if not nb_nodes:
        return []
    stream.next()
    return dict(parse_chunk(chunk) for chunk in collect_chunks(stream))

def main(*args):
    with open(args[0], "r") as f:
        print parse(f)

if __name__ == "__main__":
    import sys
    main(*sys.argv[1:])

Parsing graph data file with Python

Answers (2)

Related Questions