Reputation: 3974
Can anyone help me get closer to the results I am trying to get?
I have this string being returned as OCR results after scanning an image:
7915-03226E3058-089179 Good luck for your draw on Wed 04 Sep 13 Your numbers A06 09 26 40 43 45 B 06 14 18 28 43 48 C 02 16 22 34 39 42 1111111 I I 111111111111111111111 3 plays x £1.00 for 1 draw = E 3.00 LAST WEEK, THERE WERE OVER 700,000 WINNERS ON LOTTO! 7915-032268058-089179 013779 Term. 46377201 E - •I Fill the box to void the ticket
I am attempting to pull out the values "A06 09 26 40 43 45"
, "B 06 14 18 28 43 48"
, and "C 02 16 22 34 39 42"
And to be honest, I don't need the "A"
,"B"
, and "C"
. I only need the 12 numbers after each one.
I have the regex of
[A-Z](\W*\d{2}){6}
But that is pulling out extra information that I don't want as can be seen here: http://regexr.com?372b7
Can anyone suggest how to get closer? Is there a better way to try and get to the ticket numbers?
Upvotes: 2
Views: 109
Reputation: 7307
Try this. One letter, then optional spaces, then six 2-digit numbers which must have at least one space between them all, but don't have to have a space at the very end
[A-Z]\s*((\d{2}\s+){5}\d{2})
Update:
You said you don't particularly want to retrieve the A/B/C/letter part. If your regex engine supports lookaround, you can use:
(?<=[A-Z]\s*)((\d{2}\s+){5}\d{2})
To only get the numbers after the letter.
Update 2: Update 1 may not work – I doubt a repeating group can be used in a lookbehind. Just use the first suggestion [A-Z]\s*((\d{2}\s+){5}\d{2})
and capture group 1 will be the numbers you're after.
Upvotes: 2
Reputation: 19076
Your problem mainly revolves around \W*
, this is allowing for any number (including 0) of any non word characters. So basically 111111111111 would match your capture group regex and your entire regex if preceded by a capital letter. It looks like you are wanting 2 digit pairs separated by space, you could do that like this:
[A-Z]\s*(\d{2}\s+){6}
The \s+
makes sure there is atleast one whitespace character separating the pairs.
Although the above (as with the original) will only put the last pair of digits in the capture. To fix that and also ignore trailing whitespace, this could be done:
[A-Z]\s*(\d{2}(?:\s+\d{2}){5})
Note that the (?...)
is creating a non-capture group so we can do repetitions without messing up the capture group. This will now put all 6 of the pairs of numbers into capture group 1 (which will be the only extra capture). Also, the reason for \s*
after the [A-Z]
is that it appears that there is optional whitespace after the leading character.
Upvotes: 5
Reputation: 25370
[A-Z]\s*([0-9]{2}\s+){6}
any uppercase letter, any number of spaces (or none), then any 2 digit number followed by one or more spaces, 6 times
Upvotes: 2