wont_compile
wont_compile

Reputation: 866

Parsing graph data file with Python

I have one relatively small issue, but I can't keep to wrap my head around it. I have a text file which has information about a graph, and the structure is as follows:

So, sample data with two nodes is:

3

1
1
2
2 3
0

2
1
0
2
1 3

3
2
1
1
1
2

So, node 1 has type 1, two up edges, 2 and 3, and no down edges. Node 2 has type 1, zero up edges, and 2 down edges, 1 and 3 Node 3 has type 2, one up edge, 1, and 1 down edge, 2.

This info is clearly readable by human, but I am having issues writing a parser to take this information and store it in usable form.

I have written a sample code:

f = open('C:\\data', 'r')
lines = f.readlines()
num_of_nodes = lines[0]
nodes = {}
counter = 0
skip_next = False
for line in lines[1:]:
    new = False
    left = False
    right = False
    if line == "\n":
        counter += 1
        nodes[counter] = []
        new = True
        continue
    nodes[counter].append(line.replace("\n", ""))

Which kinda gets me the info split for each node. I would like something like a dictionary, which would hold the ID, up and down neighbors for each (or False if there are none available). I suppose that I could now parse through this list of nodes again and do each on its own, but I am wondering can I modify this loop I have to do that nicely in the first place.

Upvotes: 0

Views: 2707

Answers (2)

bruno desthuilliers
bruno desthuilliers

Reputation: 77902

Is that what you want ?

{1: {'downs': [], 'ups': [2, 3], 'node_type': 1}, 
 2: {'downs': [1, 3], 'ups': [], 'node_type': 1}, 
 3: {'downs': [2], 'ups': [1], 'node_type': 2}}

Then here's the code:

def parse_chunk(chunk):
    node_id = int(chunk[0])
    node_type = int(chunk[1])

    nb_up = int(chunk[2])
    if nb_up:
        ups = map(int, chunk[3].split())
        next_pos = 4
    else:
        ups = []
        next_pos = 3

    nb_down = int(chunk[next_pos])
    if nb_down:
        downs = map(int, chunk[next_pos+1].split())
    else:
        downs = []

    return node_id, dict(
        node_type=node_type,
        ups=ups,
        downs=downs
        )

def collect_chunks(lines):
    chunk = []
    for line in lines:
        line = line.strip()
        if line:
            chunk.append(line)
        else:
            yield chunk
            chunk = []
    if chunk:
        yield chunk

def parse(stream):
    nb_nodes = int(stream.next().strip())
    if not nb_nodes:
        return []
    stream.next()
    return dict(parse_chunk(chunk) for chunk in collect_chunks(stream))

def main(*args):
    with open(args[0], "r") as f:
        print parse(f)

if __name__ == "__main__":
    import sys
    main(*sys.argv[1:])

Upvotes: 2

puredevotion
puredevotion

Reputation: 1195

I would do it as presented below. I would add a try-catch around file-reading, and read your files with the with-statement

nodes = {}
counter = 0
with open(node_file, 'r', encoding='utf-8') as file:
     file.readline()                              # skip first line, not a node
     for line in file.readline():
         if line == "\n":
             line = file.readline()               # read next line
             counter = line[0]
             nodes[counter] = {}                  # create a nested dict per node
             line = file.readline() 
             nodes[counter]['type'] = line[0]     # add node type
             line = file.readline()
             if line[0] != '0':
                 line = file.readline()           # there are many ways
                 up_edges = line[0].split()       # you can store edges
                 nodes[counter]['up'] = up_edges  # here a list
                 line = file.readline()
             else: 
                 line = file.readline()
             if line[0] != '0':
                 line = file.readline()
                 down_edges = line[0].split()     # store down-edges as a list  
                 nodes[counter]['down'] = down_edges  
             # end of chunk/node-set, let for-loop read next line
         else:
              print("this should never happen! line: ", line[0])

This reads the files per line. I'm not sure about your data-files, but this is easier on your memory. IF memory is an issue, this will be slower in terms of HDD reading (although a SSD does miracles)

Haven't tested the code, but the concept is clear :)

Upvotes: 1

Related Questions