Coyttl
Coyttl

Reputation: 534

Matching multiple lines up until a sepertor line?

Learning myself some Regex, while trying to parse a datasheet, and I'm thinking there's not an easy way (in Regex, I mean.. in C#, sure!) to do this. Say I have a file with the lines:

0000AA  One Token    -  Value
0000AA  Another Token-  Another Value
0000AA  YA Token     -  Yet Another
0000AA  Yes, Another -  Even More
0000AA
0000AA  ______________________________________________________________________
0000AA  This line - while it will match the regex, shouldn't.

So I have an easy multi-line regex: ^\s*[A-Z]{2}[0-9]{4}\s\s*(?<token>.*?)\-(?<value>.*?)$ This loads All the 'Tokens' into 'token', and all the values into 'value' group. Pretty simple! However, the Regex ALSO matches the bottom line, putting 'This line' into the token, and 'while it will [...]' into the value.

Essentially, I'd like the regex to only match the lines above the ____ separator line. Would this be possible with Regex alone, or will I need to modify my incoming string first to .Split() on the ____ separator line?

Cheers all -
-Mike.

Upvotes: 1

Views: 130

Answers (2)

Qtax
Qtax

Reputation: 33908

I'd like the regex to only match the lines above the ____ separator line. Would this be possible with Regex alone?

Sure it's possible. Add a lookahead to make sure such a line follows, something like:

(?=(?s).*^\w{6}[ \t]+_{4,})

Add this to the end of your expression to make sure that such a line follows. Eg:

(?m)^\s*[A-Z]{2}[0-9]{4}\s\s*(?<token>.*?)\-(?<value>.*)$(?=(?s).*^\w{6}[ \t]+_{4,})

(Also added m and s flags in the expression.)

This is not very efficient tho, as the regex engine will probably need to scan through most of the string for every match.

Upvotes: 0

steinar
steinar

Reputation: 9653

Parsing such a text file with regex only would not be using the right tool for the job. Although possible, it would be both inefficient and unnecessarily complex.

I would actually not load all the text into a string and split on this line either, as it's not the most efficient way of doing this. I would rather read through the file in a loop, one line at a time, processing each line as needed. Then stop processing when you reach this particular line.

Upvotes: 1

Related Questions