Reputation: 25
My source data text looks something like this:
a1,a2,a3
a4,a5,a6
a7,a8,a9
test="1"
b1,b2,b3
b4,b5,b6
b7,b8,b9
test="2"
c1,c2,c3
c4,c5,c6
c7,c8,c9
test="3"
I need to parse this so the end result looks like this (appropriate “test” field included in each row):
a1,a2,a3,1
a4,a5,a6,1
a7,a8,a9,1
b1,b2,b3,2
b4,b5,b6,2
b7,b8,b9,2
c1,c2,c3,3
c4,c5,c6,3
c7,c8,c9,3
...etc
this what I started with and captures the fields correctly:
(?<f1>.*?),(?<f2>.*?),(?<f3>.*?)\s+
I understand I need to use lookarounds to capture and include the “test” field on each line.
So something like this added (using a positive lookahead)…
(?<f1>.*?),(?<f2>.*?),(?<f3>.*?)\s+(?=test="(?<test>.*?)")
This seems close but is not yielding all rows of data, but instead only the last row of data with the included test value as if it is consuming the look ahead row.
This expression with its captured groups are input into a .NET application that inserts these captured groups as fields within a database table. Number of fields is always static (4 in the example above; field1=f1, field2=f2, field3=f3, field4=test), but the number of records will be variable.
Any guidance would be appreciated.
Upvotes: 1
Views: 98
Reputation: 9644
Parsing your data to extract the relevant values
You are almost there, but need to allow the look ahead to skip the rows between the current one and the test line:
(?ms)(?<f1>\w+),(?<f2>\w+),(?<f3>\w+)\R(?=.*?^test="(?<test>\d+)")
\R
matches all sort of newlines, (?ms)
is the inline way of turning on the multiline and dot match all modifiers, so that .*?^test
matches every line up to the test one, see demo here.
Again, your issue was that \s+
forced the lookahead to be on the line right after the one your were matching.
Upvotes: 3