Lak
Lak

Reputation: 166

Regular expression which ignores the few character until it finds a pattern mentioned

I have to find a decimal in the pdf, which comes under the column "charge".

So, i have come across the regular expression to find the decimal which works fine. But in one of the pdf, i have in the below format.

Pdf Text - Charge (country) Eighteen Thousand one hundred Eighty One and 75/100 18,181.75 Expected - 18,181.75

Regular expression which used to find decimal after the text "Charge": (Charge ([0-9]*)(\,?[ ]?[0-9])+(.[0-9]+))

So, i want to ignore whatever comes in mid of "charge" and the decimal. and display the decimal number. Any help?

case 2: "18,181.75" sometimes may come before "Charge" as well. Like "18,181.75 Charge some text here..."

Upvotes: 0

Views: 202

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

You may make use of .NET regex unlimited-width lookbehinds:

Regex.Match(s, @"(?<=\bCharge\b.*)\d[\d,]*\.\d+|\d[\d,]*\.\d+(?=.*?\bCharge\b)")

See the regex demo

Details

  • (?<=\bCharge\b.*)\d[\d,]*\.\d+ - a location preceded with a Charge as a whole word with chars other than newline after it, and then matches a digit followed with 0+ commas or digits, then a dot and 1+ digits
  • | - or
  • \d[\d,]*\.\d+(?=.*?\bCharge\b) - a digit followed with 0+ commas or digits, then a dot and 1+ digits, and that should be followed by any 0+ chars other than newline as few as possible and then Charge as a whole word

enter image description here

Upvotes: 2

Mahmoud-Abdelslam
Mahmoud-Abdelslam

Reputation: 643

What about this :

(?<=[Cc]harge.)([0-9],[0-9].[0-9])|[0-9],[0-9].[0-9](?=\s[Cc]harge)

Upvotes: 0

Manoj Choudhari
Manoj Choudhari

Reputation: 5624

Below regular expression should help you.

Charge.*[0-9]+([,]?[0-9]+)*\.([0-9]){0,2}$

Hope this works.

Upvotes: 0

Related Questions