Reputation: 35
After a week of searching the web and trying different approaches, i give up. I am facing an issue with regEx in Java and i am wondering if i can find some help here.
I am trying to find this "< < < < 06 76 > > "
pattern in a huge string that i have to search through.
What i know is that, between the last "<"
and the first ">"
there can only be numbers type characters and any amount of spaces between the last ">"
and the first "<"
. Also, between each "<"
or ">"
can be from 1 to 5 spaces.
I was able to create part of a pattern to use for my search, but i cant move forward from there.
Here is what i was able to create as a search pattern.
String tag_open = "<\\s{0,4}<\\s{0,4}<\\s{0,4}<\\s{0,4}";
I am stuck trying to include the idea of "any numbers, not more than 4 digits, separated by 1 to 5 spaces".
Finally, i am able to "close" the pattern to be searched with
"\\s{0,4}>\\s{0,4}>\\s{0,4}"
Sorry for the long text. I am trying to be as detailed as possible. Thanks so much! Regards.
I think i forgot to say something... I actually did... There are 2 types of "tags" that i have to look for... One is " < < < < 06 76 > > " and the second one is " < < 39 85 > > > > ". Where, the amount of spaces between each "<" and ">" can be from 1 to 4 and the same amount of spaces between the last "<" and the first number character. The same idea is between the last number character and the first ">". Last, from 1 to 6 spaces between the numbers.
Ok... Hope its my last edit. :-) I have to find the position of those TWO type of tags that will show me the begging and the end of each paragraph. The begging of the paragraph is establish by the pattern: Start of paragraph: Four "<<<<"* + some spaces + 2 random digits + some spaces + 2 random digits + some spaces + Two ">>"*.
End of paragraph: Two "<<"* + some spaces + 2 random digits + some spaces + 2 random digits + some spaces + Four ">>>>"*.
Here is an example of a text paragraph:
< < < < 06 76 > > Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec sit amet mauris lorem. Etiam aliquam iaculis tellus, ac accumsan velit. Vivamus venenatis diam sit amet elementum sollicitudin. Curabitur nec finibus tellus. Proin vestibulum placerat diam. Sed eget risus volutpat, placerat arcu non, commodo ex. Vivamus et ipsum efficitur, ornare nisi sit amet, venenatis diam. Sed aliquet lacinia nulla eu mattis. Integer dapibus, odio a rhoncus porttitor, tellus ligula imperdiet sem, at semper magna arcu a mauris. Vestibulum accumsan ornare aliquet. Curabitur a mollis ex, a ullamcorper enim. Donec urna nibh, vestibulum ut gravida vel, posuere id elit. Proin ut fringilla turpis. < < 06 76 > > > >
< < < < 12 23 > > Morbi aliquet condimentum tempus. Fusce quis rutrum lacus. Curabitur blandit vestibulum lacinia. Ut ac maximus dolor. Suspendisse potenti. Sed quis turpis felis. Sed magna mauris, mattis non mi id, mollis posuere massa. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Suspendisse dictum sapien bibendum dictum ultricies. Suspendisse sed lectus egestas, congue ligula quis, fringilla sapien. Nullam et odio elit. Nullam pellentesque nunc tellus, vitae pharetra lorem congue id. < < 12 23 > > > >
Again, sorry for the long post and the many last minute edits.
Upvotes: 1
Views: 95
Reputation: 163632
You might use
<(?:\s{1,5}<)*\s*(?:\d+\s*)+>(?:\s{1,5}>)*
<
Match literally(?:\s{1,5}<)*
Repeat 0+ times matches 1-5 whitespace chars followed by <
\s*
Match optional whitespace chars(?:\d+\s*)+
Match 1+ times matching 1+ digits and optional whitespace chars>
Match literally(?:\s{1,5}>)+
Repeat 0+ times matching 1-5 whitespace chars followed by >
Note that \s
could also match a newline. In Java you might also use \h{1,5}
to match horizontal whitespace chars.
Example in Java:
String regex = "<(?:\\s{1,5}<)*\\s*(?:\\d+\\s*)+>(?:\\s{1,5}>)*";
String string = "< < < < 06 76 > >\n"
+ "< < < < > >\n"
+ "< < < < 06 76 >\n"
+ "< < < < 06 76 \n";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
< < < < 06 76 > >
< < < < 06 76 >
EDIT
The pattern for the start of the paragraph
<(?:\s{1,4}<){3}(?:\s*\d\d){2}\s*>\s{1,4}>
The pattern for the end of the paragraph
<\s{1,4}<(?:\s*\d\d){2}\s*>(?:\s{1,4}>){3}
If you want for example to get the content of the paragraph, you could use a capture group.
<(?:\s{1,4}<){3}(?:\s*\d\d){2}\s*>\s{1,4}>([\s\S]*?)<\s{1,4}<(?:\s*\d\d){2}\s*>(?:\s{1,4}>){3}
Upvotes: 1
Reputation: 3001
Something like this?
String input = "< < < < 06 76 > > ";
//For all tags
Pattern pat = Pattern.compile("(< +)+([0-9]+ +)+(> +)+");
//For tag < < < < 06 76 > >
//Pattern pat = Pattern.compile("(< +){4}([0-9]+ +)+(> +)+");
//For tag < < 39 85 > > > >
//Pattern pat = Pattern.compile("(< +){2}([0-9]+ +)+(> +)+");
Matcher mat = pat.matcher(input);
while(mat.find()) {
System.out.println(mat.group());
}
//Prints:
//< < < < 06 76 > >
Upvotes: 1