Reputation: 534
Learning myself some Regex, while trying to parse a datasheet, and I'm thinking there's not an easy way (in Regex, I mean.. in C#, sure!) to do this. Say I have a file with the lines:
0000AA One Token - Value
0000AA Another Token- Another Value
0000AA YA Token - Yet Another
0000AA Yes, Another - Even More
0000AA
0000AA ______________________________________________________________________
0000AA This line - while it will match the regex, shouldn't.
So I have an easy multi-line regex:
^\s*[A-Z]{2}[0-9]{4}\s\s*(?<token>.*?)\-(?<value>.*?)$
This loads All the 'Tokens' into 'token', and all the values into 'value' group. Pretty simple! However, the Regex ALSO matches the bottom line, putting 'This line' into the token, and 'while it will [...]' into the value.
Essentially, I'd like the regex to only match the lines above the ____
separator line. Would this be possible with Regex alone, or will I need to modify my incoming string first to .Split() on the ____
separator line?
Cheers all -
-Mike.
Upvotes: 1
Views: 130
Reputation: 33908
I'd like the regex to only match the lines above the
____
separator line. Would this be possible with Regex alone?
Sure it's possible. Add a lookahead to make sure such a line follows, something like:
(?=(?s).*^\w{6}[ \t]+_{4,})
Add this to the end of your expression to make sure that such a line follows. Eg:
(?m)^\s*[A-Z]{2}[0-9]{4}\s\s*(?<token>.*?)\-(?<value>.*)$(?=(?s).*^\w{6}[ \t]+_{4,})
(Also added m
and s
flags in the expression.)
This is not very efficient tho, as the regex engine will probably need to scan through most of the string for every match.
Upvotes: 0
Reputation: 9653
Parsing such a text file with regex only would not be using the right tool for the job. Although possible, it would be both inefficient and unnecessarily complex.
I would actually not load all the text into a string and split on this line either, as it's not the most efficient way of doing this. I would rather read through the file in a loop, one line at a time, processing each line as needed. Then stop processing when you reach this particular line.
Upvotes: 1