Reputation: 866
I have one relatively small issue, but I can't keep to wrap my head around it. I have a text file which has information about a graph, and the structure is as follows:
So, sample data with two nodes is:
3
1
1
2
2 3
0
2
1
0
2
1 3
3
2
1
1
1
2
So, node 1 has type 1, two up edges, 2 and 3, and no down edges. Node 2 has type 1, zero up edges, and 2 down edges, 1 and 3 Node 3 has type 2, one up edge, 1, and 1 down edge, 2.
This info is clearly readable by human, but I am having issues writing a parser to take this information and store it in usable form.
I have written a sample code:
f = open('C:\\data', 'r')
lines = f.readlines()
num_of_nodes = lines[0]
nodes = {}
counter = 0
skip_next = False
for line in lines[1:]:
new = False
left = False
right = False
if line == "\n":
counter += 1
nodes[counter] = []
new = True
continue
nodes[counter].append(line.replace("\n", ""))
Which kinda gets me the info split for each node. I would like something like a dictionary, which would hold the ID, up and down neighbors for each (or False if there are none available). I suppose that I could now parse through this list of nodes again and do each on its own, but I am wondering can I modify this loop I have to do that nicely in the first place.
Upvotes: 0
Views: 2707
Reputation: 77902
Is that what you want ?
{1: {'downs': [], 'ups': [2, 3], 'node_type': 1},
2: {'downs': [1, 3], 'ups': [], 'node_type': 1},
3: {'downs': [2], 'ups': [1], 'node_type': 2}}
Then here's the code:
def parse_chunk(chunk):
node_id = int(chunk[0])
node_type = int(chunk[1])
nb_up = int(chunk[2])
if nb_up:
ups = map(int, chunk[3].split())
next_pos = 4
else:
ups = []
next_pos = 3
nb_down = int(chunk[next_pos])
if nb_down:
downs = map(int, chunk[next_pos+1].split())
else:
downs = []
return node_id, dict(
node_type=node_type,
ups=ups,
downs=downs
)
def collect_chunks(lines):
chunk = []
for line in lines:
line = line.strip()
if line:
chunk.append(line)
else:
yield chunk
chunk = []
if chunk:
yield chunk
def parse(stream):
nb_nodes = int(stream.next().strip())
if not nb_nodes:
return []
stream.next()
return dict(parse_chunk(chunk) for chunk in collect_chunks(stream))
def main(*args):
with open(args[0], "r") as f:
print parse(f)
if __name__ == "__main__":
import sys
main(*sys.argv[1:])
Upvotes: 2
Reputation: 1195
I would do it as presented below. I would add a try-catch around file-reading, and read your files with the with
-statement
nodes = {}
counter = 0
with open(node_file, 'r', encoding='utf-8') as file:
file.readline() # skip first line, not a node
for line in file.readline():
if line == "\n":
line = file.readline() # read next line
counter = line[0]
nodes[counter] = {} # create a nested dict per node
line = file.readline()
nodes[counter]['type'] = line[0] # add node type
line = file.readline()
if line[0] != '0':
line = file.readline() # there are many ways
up_edges = line[0].split() # you can store edges
nodes[counter]['up'] = up_edges # here a list
line = file.readline()
else:
line = file.readline()
if line[0] != '0':
line = file.readline()
down_edges = line[0].split() # store down-edges as a list
nodes[counter]['down'] = down_edges
# end of chunk/node-set, let for-loop read next line
else:
print("this should never happen! line: ", line[0])
This reads the files per line. I'm not sure about your data-files, but this is easier on your memory. IF memory is an issue, this will be slower in terms of HDD reading (although a SSD does miracles)
Haven't tested the code, but the concept is clear :)
Upvotes: 1