Trevor Daniel
Trevor Daniel

Reputation: 3974

Regular Expression Tweek

Can anyone help me get closer to the results I am trying to get?

I have this string being returned as OCR results after scanning an image:

7915-03226E3058-089179 Good luck for your draw on Wed 04 Sep 13 Your numbers A06 09 26 40 43 45 B 06 14 18 28 43 48 C 02 16 22 34 39 42 1111111 I I 111111111111111111111 3 plays x £1.00 for 1 draw = E 3.00 LAST WEEK, THERE WERE OVER 700,000 WINNERS ON LOTTO! 7915-032268058-089179 013779 Term. 46377201 E - •I Fill the box to void the ticket

I am attempting to pull out the values "A06 09 26 40 43 45", "B 06 14 18 28 43 48", and "C 02 16 22 34 39 42"

And to be honest, I don't need the "A","B", and "C". I only need the 12 numbers after each one.

I have the regex of

[A-Z](\W*\d{2}){6}

But that is pulling out extra information that I don't want as can be seen here: http://regexr.com?372b7

Can anyone suggest how to get closer? Is there a better way to try and get to the ticket numbers?

Upvotes: 2

Views: 109

Answers (3)

Shai
Shai

Reputation: 7307

Try this. One letter, then optional spaces, then six 2-digit numbers which must have at least one space between them all, but don't have to have a space at the very end

[A-Z]\s*((\d{2}\s+){5}\d{2})

Demo

Update:

You said you don't particularly want to retrieve the A/B/C/letter part. If your regex engine supports lookaround, you can use:

(?<=[A-Z]\s*)((\d{2}\s+){5}\d{2})

Regular expression visualization

Debuggex Demo

To only get the numbers after the letter.

Update 2: Update 1 may not work – I doubt a repeating group can be used in a lookbehind. Just use the first suggestion [A-Z]\s*((\d{2}\s+){5}\d{2}) and capture group 1 will be the numbers you're after.

Upvotes: 2

Smern
Smern

Reputation: 19076

Your problem mainly revolves around \W*, this is allowing for any number (including 0) of any non word characters. So basically 111111111111 would match your capture group regex and your entire regex if preceded by a capital letter. It looks like you are wanting 2 digit pairs separated by space, you could do that like this:

[A-Z]\s*(\d{2}\s+){6}

Demo

enter image description here

The \s+ makes sure there is atleast one whitespace character separating the pairs.


Although the above (as with the original) will only put the last pair of digits in the capture. To fix that and also ignore trailing whitespace, this could be done:

[A-Z]\s*(\d{2}(?:\s+\d{2}){5})

Demo

enter image description here

Note that the (?...) is creating a non-capture group so we can do repetitions without messing up the capture group. This will now put all 6 of the pairs of numbers into capture group 1 (which will be the only extra capture). Also, the reason for \s* after the [A-Z] is that it appears that there is optional whitespace after the leading character.

Upvotes: 5

Jonesopolis
Jonesopolis

Reputation: 25370

[A-Z]\s*([0-9]{2}\s+){6}

any uppercase letter, any number of spaces (or none), then any 2 digit number followed by one or more spaces, 6 times

Upvotes: 2

Related Questions