luuke
luuke

Reputation: 135

Python regex matching problem

I have been parsing a GraphViz file for a specific identifer using Regex. Here is the typical content from this file:

node10 [label="second-messenger-mediated signaling\nGO:0019932", fontname=Courier, ...];

node11 [label="inositol phosphate-mediated signaling\nGO:0048016", fontname=Courier, ...];

node12 [label="activation of phospholipase C activity by G-protein coupled receptor protein signaling pathway coupled to IP3 second messenger\n\

GO:0007200", fontname=Courier, ...];

node13 [label="G-protein coupled receptor protein signaling pathway\nGO:0007186", fontname=Courier, ...];

node14 [label="activation of phospholipase C activity\nGO:0007202", fontname=Courier, ...];

node15 [label="elevation of cytosolic calcium ion concentration involved in G-protein signaling coupled to IP3 second messenger\nGO:0051482", fontname=Courier, pos="798,1162", width="9.56", height="0.50"];

Since I am only interested in the nodeid, label and the GO identifier I have used the following regex to match each line:

(node\d*)\s\[label=\"([\w\s-]*).*(GO:\d*)

I know that it's neither terribly elegant nor very efficient but it got the job done except for the line with node12. I have tried using re.DOTALL and re.MULTILINE but to no avail.

Can anyone help me spot the missing piece of the puzzle to make the regex also work with node12 ?

**EDIT:

Here [1] is a link to the file that contains one of those lines.

[1] http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?session_id=7924amigo1292519756&term=GO:0051482&format=dot

Upvotes: 0

Views: 139

Answers (3)

Katriel
Katriel

Reputation: 123622

Don't reinvent the wheel.

pydot is a library which parses dot files using pyparsing.

Upvotes: 3

Ant
Ant

Reputation: 5414

if you match each line, then node 12 will be splitted in 2 lines...you should read the all file or iter between one node and one other...

Upvotes: 2

Mitro
Mitro

Reputation: 1616

Is the \ after the first line of node12 escaping the line ending?

Upvotes: 1

Related Questions