Find substring pattern in java

I have a scenario like below,

There are few sub-strings need to extract from one string,

example : Main string :

<title><spring:message code='cdc.header.title'/><br></span><span><p></p> <spring:message code='cdc.accessdenied.title'/></title>

So i need to extract <spring:message code='cdc.header.title'/>,<spring:message code='cdc.accessdenied.title'/> ,

I mean what ever spring tag is there i want to retrieve those sub string as List<String>,

I dont want to use XML parser, I want to java PATTERN matcher because my file might not be well formed.

Please help me on this . Thanks

Upvotes: 1

Views: 260

Answers (4)

Bohemian
Bohemian

Reputation: 425218

With this approach, it can be done in just one line of code (updated with new requirement as per comment):

List<String> springTags = Arrays.asList(str.replaceAll("(?s)^.*?(?=<spring)|(?<=/>)(?!.*<spring).*?$", "").split("(?s)(?<=/>).*?(?=<spring|$)"));

This works by first stripping off any leading and trailing xml wrapping/chars, then splitting on xml end/start of tag. It will actually extract all spring tags from any kind of input - whatever comes before or after the spring tags is thrown away.

Here's some test code:

String str = "<title><spring:message code='cdc.header.title'/> <span></span></br><spring:message code='cdc.accessdenied.title'/></title>";
List<String> springTags = Arrays.asList(str.replaceAll("^.*?(?=<spring)|(?<=/>)(?!.*<spring).*?$", "").split("(?<=/>).*?(?=<spring|$)"));
System.out.println(springTags);

Output:

[<spring:message code='cdc.header.title'/>, <spring:message code='cdc.accessdenied.title'/>]

Upvotes: 2

Radiodef
Radiodef

Reputation: 37875

Here's an example that does this in pure Java:

public static ArrayList<String> parseDocument(
        final String document,
        final String begin,
        final String end) {

    ArrayList<String> subs = new ArrayList<String>(0);

    document_parse:
        for (int i = 0, h, j, k; i < document.length(); ) {

            for (h = i, k = 0; k < begin.length(); h++, k++) {
                if (h > document.length() - begin.length()) {
                    break document_parse;

                } else if (document.charAt(h) != begin.charAt(k)) {
                    i++;
                    continue document_parse;
                }
            }

            end_search:
                for ( ; ; h++) {
                    if (h > document.length() - end.length()) {
                        break document_parse;
                    }

                    for (j = h, k = 0; k < end.length(); j++, k++) {
                        if (document.charAt(j) != end.charAt(k)) {
                            continue end_search;
                        }
                    }

                    if (k == end.length()) {
                        break;
                    }
                }

            h += end.length();

            subs.add(document.substring(i, h));

            i = h;
        }

    return subs;
}

This kind of thing might be faster than regex. The loops are a bit complex but I tested it and it works.

Upvotes: 0

Ruchira Gayan Ranaweera
Ruchira Gayan Ranaweera

Reputation: 35577

<tag> something</tag>

you can extract "something", using XML parser library.

Upvotes: 1

shikjohari
shikjohari

Reputation: 2288

You can use the DOM parser and parse the file as an XML file. I guess you have to retrieve other nodes, attributes and values also, Parser will really help you in this case.

Upvotes: 0

Related Questions