Reputation: 3470
Hi Im trying to get a RegEx to work. I have this text:
/Ffont2 45.83 Tf 252 980 Td (XX7445 DDA PURCHASE 05/28 04:48
MCDONALD'S F561 CHICAGO IL 105/29 10.25) Tj ET
0.000000 0.000000 0.000000 rg 0.000000 0.000000 0.000000 RG BT /Ffont2 45.83 Tf 252 937 Td ( 12333378 214904443) Tj ET
0.000000 0.000000 0.000000 rg 0.000000 0.000000 0.000000 RG BT /Ffont2 45.83 Tf 252 894 Td (CITI CARD ONLINE PAYMENT 12345678 05/29 87.99) Tj ET
0.000000 0.000000 0.000000 rg 0.000000 0.000000 0.000000 RG BT /Ffont2 45.83 Tf 252 851 Td (XX7445 DDA PURCHASE 0528 14:11 #03632 JEWEL CHICAGO IL 0529 97.60) Tj ET
and Im trying to get everything from Td
to Tj
like
Td (CITI CARD ONLINE PAYMENT 12345678 05/29 87.99) Tj
but I want to skip things if they have no date, (must have forward slash), they must have a money amount(must have period) and I dont want it if it has the word "purchase" in it. So
Td (XX7445 DDA PURCHASE 0528 14:11 #03632 JEWEL CHICAGO IL 0529 97.60) Tj
would not be returned. right now I have
(Td \()([^\)]*)([^\)]*)([/][^\)]*[.][^\)]*\) Tj)
for my regex and that gets everything but it gets it even it has "purchase"
Upvotes: 0
Views: 545
Reputation: 13631
If you want to use a regex to ensure that your match doesn't contain the word 'PURCHASE', you could use a negative look-ahead such as the following:
@"(?![^\)]*PURCHASE)(Td \()([^\)]*)([^\)]*)([/][^\)]*[.][^\)]*\) Tj)"
The look-ahead prevents a match if the word 'PURCHASE' appears before the next )
.
If you want to prevent 'purchase' also, you could add (?i)
to the start of the regex, or add the RegexOptions.IgnoreCase
flag as the last argument to the Regex
method call.
Looking closer at your regex I notice that the second ([^\)]*)
is redundant as everything that it matches will be captured by the ([^\)]*)
immediately preceding it.
It also seems strange that your are capturing (Td \()
- the capture will always be Td (
, so why bother? And the second capture will start with /
and end with Tj)
- is that what you intended?
I assume you know that you could replace the [/]
with \/
, and [.]
with \.
.
Anyway, to just capture what is inside the brackets, you could use:
@"(?![^\)]*PURCHASE)Td \(([^\)]*\/[^\)]*\.[^\)]*)\) Tj";
Upvotes: 1
Reputation: 65059
What you have is fine. Regex can be used for this.. but why put a Formula 1 car on a go-kart track (<--- bad analogy..) waste CPU cycles?
var matchesWithoutPurchase = Regex.Matches(yourInput, @"(Td \()([^\)]*)([^\)]*)([/][^\)]*[.][^\)]*\) Tj)")
.Cast<Match>().Where(x => !x.Value.ToLower().Contains("purchase"));
foreach (var match in matchesWithoutPurchase) {
Console.WriteLine(match);
}
Regex negative lookarounds are overkill for this.
Upvotes: 2