Reputation: 135
I have been parsing a GraphViz file for a specific identifer using Regex. Here is the typical content from this file:
node10 [label="second-messenger-mediated signaling\nGO:0019932", fontname=Courier, ...];
node11 [label="inositol phosphate-mediated signaling\nGO:0048016", fontname=Courier, ...];
node12 [label="activation of phospholipase C activity by G-protein coupled receptor protein signaling pathway coupled to IP3 second messenger\n\
GO:0007200", fontname=Courier, ...];
node13 [label="G-protein coupled receptor protein signaling pathway\nGO:0007186", fontname=Courier, ...];
node14 [label="activation of phospholipase C activity\nGO:0007202", fontname=Courier, ...];
node15 [label="elevation of cytosolic calcium ion concentration involved in G-protein signaling coupled to IP3 second messenger\nGO:0051482", fontname=Courier, pos="798,1162", width="9.56", height="0.50"];
Since I am only interested in the nodeid, label and the GO identifier I have used the following regex to match each line:
(node\d*)\s\[label=\"([\w\s-]*).*(GO:\d*)
I know that it's neither terribly elegant nor very efficient but it got the job done except for the line with node12. I have tried using re.DOTALL and re.MULTILINE but to no avail.
Can anyone help me spot the missing piece of the puzzle to make the regex also work with node12 ?
**EDIT:
Here [1] is a link to the file that contains one of those lines.
Upvotes: 0
Views: 139
Reputation: 123622
Don't reinvent the wheel.
pydot
is a library which parses dot files using pyparsing
.
Upvotes: 3
Reputation: 5414
if you match each line, then node 12 will be splitted in 2 lines...you should read the all file or iter between one node and one other...
Upvotes: 2