Reputation: 166
I have to find a decimal in the pdf, which comes under the column "charge".
So, i have come across the regular expression to find the decimal which works fine. But in one of the pdf, i have in the below format.
Pdf Text - Charge (country) Eighteen Thousand one hundred Eighty One and 75/100 18,181.75 Expected - 18,181.75
Regular expression which used to find decimal after the text "Charge": (Charge ([0-9]*)(\,?[ ]?[0-9])+(.[0-9]+))
So, i want to ignore whatever comes in mid of "charge" and the decimal. and display the decimal number. Any help?
case 2: "18,181.75" sometimes may come before "Charge" as well. Like "18,181.75 Charge some text here..."
Upvotes: 0
Views: 202
Reputation: 626748
You may make use of .NET regex unlimited-width lookbehinds:
Regex.Match(s, @"(?<=\bCharge\b.*)\d[\d,]*\.\d+|\d[\d,]*\.\d+(?=.*?\bCharge\b)")
See the regex demo
Details
(?<=\bCharge\b.*)\d[\d,]*\.\d+
- a location preceded with a Charge
as a whole word with chars other than newline after it, and then matches a digit followed with 0+ commas or digits, then a dot and 1+ digits|
- or\d[\d,]*\.\d+(?=.*?\bCharge\b)
- a digit followed with 0+ commas or digits, then a dot and 1+ digits, and that should be followed by any 0+ chars other than newline as few as possible and then Charge
as a whole wordUpvotes: 2
Reputation: 643
What about this :
(?<=[Cc]harge.)([0-9],[0-9].[0-9])|[0-9],[0-9].[0-9](?=\s[Cc]harge)
Upvotes: 0
Reputation: 5624
Below regular expression should help you.
Charge.*[0-9]+([,]?[0-9]+)*\.([0-9]){0,2}$
Hope this works.
Upvotes: 0