Reputation: 73
I'm rather poor in algorithm design and have a complex problem - please take a look. I'm currently working in Java/Groovy.
I've got some text that looks like this:
AAAAA
AAAAA
CCCCC
any stuff here
111
any stuff here
AAAAA
stuff
AAAAA
stuff
AAAAA
BBBBB
stuff
222
stuff
BBBBB
My challenge is to grab all the strings that are in the format of AAAAA stuff 111 stuff AAAAA, without grabbing any surrounding text. You can see that there are multiple AAAAA in the string, but I must only grab the ones closest to the 111s and 222s, and then do this for all strings of this type.
My regular expressions (not working) look like this:
/(\w{8}|\w{11}).*?(\w{3}).*?\1/
I've been playing around with a bunch of them and they either grab too much text or perform too slowly... if anyone has an idea of what I should be using for this type of problem, please let me know.
Edit: These are what I am trying to match:
AAAAA
CCCCC
any stuff here
111
any stuff here
AAAAA
and
BBBBB
stuff
222
stuff
BBBBB
I'd say this is pretty much like parsing improperly tagged XML. Anyway, thanks for looking.
Upvotes: 1
Views: 136
Reputation: 43703
Use regex pattern
(?s)\b(\w{5})\b(?:(?!\1).)*?\b\w{3}\b(?:(?!\1).)*?\1
Upvotes: 2