kevin sufferdini
kevin sufferdini

Reputation: 121

regex matching characters in a long string in java

I am building a JSP but I am new to regex and am having some trouble. I have a very long string with a pattern that looks like this:

==SOME_ID== - item 1 - item 2 - item 3 .. item 100 == SOME_ID_2 == - item 1 - item 2 - item 3 ... item 100 == SOME_ID_3 == ...

so it has the "identifier" which is enclosed in '==' characters, followed by a dash "-" separated list. I am trying to extract the Indentifiers and their item elements. Once I have the information extracted from the string I plan on constructing an XML document with the information.

One more note, an "item" can be more than one word.

EDIT: here is my code so far

<%
String testStr = (String)pageContext.getAttribute("longStr");
String[] ids = null; 
String delimeterRegex = "(?i),==*==";
ids = testStr.split(delimeterRegex);
pageContext.setAttribute("ids", ids);
%>



<c:forEach items="${ids}" var="id">
    ${id}
</c:forEach>

Any help would be greatly appreciated. Thank you

Upvotes: 0

Views: 377

Answers (2)

Bohemian
Bohemian

Reputation: 425003

Here's some code that will create a map of the name to the array of its values:

Map<String, String[]> map = new HashMap<String, String[]>();
for (String mapping : input.split("(?<!^)(?===\\s*\\w+\\s*==)")) {
    String name = mapping.replaceAll("^==\\s*(\\w+).*", "$1");
    String[] values = mapping.replaceAll("^==\\s*\\w+\\s*==\\s*-*\\s*", "").split("\\s*-\\s*");
    map.put(name, values);
}

This first splits using a look-ahead that matches on a "name" - look aheads are non-capturing, thus preserving the name for the next step.

The name-and-values String then has the name part extracted and the values parts is split on a dash. All regex matches are done such that whitespace is trimmed from targets.

I've tested it and it works well - stripping off any optional whitespace around name and values.

Upvotes: 1

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726559

You can use this regular expression:

==([^=]+)==([^=]+)(?=(?:=|$))

This expression captures a string between two pairs of equal signs, and then takes everything until the next = or the end of string. The ID becomes the first capturing group; the data becomes the second one. Groups are numbered from one, not from zero (group zero is special - it represents the entire match).

Here is a complete example:

String data = "==SOME_ID== - item 1 - item 2 - item 3 .. item 100 == SOME_ID_2 == - item 1 - item 2 - item 3 ... item 100 == SOME_ID_3 == ...";
Pattern p = Pattern.compile("==([^=]+)==([^=]+)(?=(?:=|$))");
    Matcher m = p.matcher(data);
while (m.find()) {
    System.out.println("ID="+m.group(1));
    System.out.println("Data="+m.group(2));
}

Demo on ideone.

ID=SOME_ID
Data= - item 1 - item 2 - item 3 .. item 100 
ID= SOME_ID_2 
Data= - item 1 - item 2 - item 3 ... item 100 
ID= SOME_ID_3 
Data= ...

Once you get your data (i.e. group(2)) you could run a String.split on the dash to separate out the individual data elements.

Upvotes: 2

Related Questions