ulquiorra
ulquiorra

Reputation: 945

java regex or other way for finding string between string and other parts of that string

I have a String like this

String s = "AZERTY<em>ZA</em> QWERTY OK <em>NE</em>NO ;

I want extract strings between and and construct a StringBuilder with all parts of the string in right order. I do this because i need to identify and localize the strings extracted but i need to keep the entire string too. The purpose for all this work is to add later the entire String in a excel sheet cell and add font for the string between

XSSFRichTextString xssfrt = new XSSFRichTextString(); // acts like a StringBuilder
    xssfrt .append("AZERTY");
    xssfrt .append("ZA" , font); //extract 1
    xssfrt .append(" QWERTY OK "); // keep spaces
    xssfrt .append("NE" , font); //extract 2
    xssfrt .append("NO");

There is my regex which can extract the desired strings but i don't know how to construct the StringBuilder with all parts in right order :/

Pattern p = Pattern.compile("\\<em>(.*?)\\</em>");
            Matcher m = p.matcher(value);
            while(m.find())
            {
                m.group(1); //extracts
            }

Thank you very much

Upvotes: -1

Views: 93

Answers (4)

Shekhar Khairnar
Shekhar Khairnar

Reputation: 2691

You need to do something like as :

        String str = "AZERTY<em>ZA</em> QWERTY OK <em>NE</em>NO";
        StringBuilder stringBuilder = new StringBuilder();
        String[] parts = str.split("(<\\/?em>)");

        System.out.println("parts : "+Arrays.toString(parts));

        for(String s:parts){
            System.out.println("Part going to append :"+s);
            stringBuilder.append(s);
        }
        System.out.println("StringBuilder : "+stringBuilder.toString());
    }

Out put will be:

> parts : [AZERTY, ZA,  QWERTY OK , NE, NO] Part going to append :AZERTY
> Part going to append :ZA Part going to append : QWERTY OK  Part going
> to append :NE Part going to append :NO StringBuilder : AZERTYZA QWERTY
> OK NENO

UPDATES :--

Check the updated code:

String str = "AZERTY<em>ZA</em> QWERTY OK <em>NE</em>NO";

        //replace word in string which is preceded by <\em> to word:font eg. ZA:font
        str = str.replaceAll("(\\w+)(?=\\<\\/em\\>)", "$1:font");
   // After replace :AZERTY<em>ZA:font</em> QWERTY OK <em>NE:font</em>NO

        String[] parts = str.split("(<\\/?em>)");
 // After split : [AZERTY, ZA:font,  QWERTY OK , NE:font, NO]   

        XSSFRichTextString xssfrt = new XSSFRichTextString();

        for(String s:parts){
            //set font according to replace string
            if(s.contains(":")){
                String[] subParts = s.split(":");
                xssfrt.append(subParts[0], /**check the subParts[0] and set the font***/ );
            }else{
                xssfrt.append(s);
            }
        }
    }

Upvotes: 0

Ravikumar
Ravikumar

Reputation: 901

You can use Matcher's appendReplacement(StringBuffer sb, String replacement) and appendTail(StringBuffer sb) function to keep it in order. And have a list which will store the extracted Strings. Something like this

public static void main(String[] args) throws java.lang.Exception {
    String s = "AZERTY<em>ZA</em> QWERTY OK <em>NE</em>NO";
    String matchedString = null;
    List<String> extractedString = new ArrayList<String>();
    Pattern p = Pattern.compile("\\<em>(.*?)\\</em>");
    Matcher m = p.matcher(s);
    StringBuffer sb = new StringBuffer();

    while (m.find()) {

        matchedString = m.group(1);
        extractedString.add(matchedString);
        m.appendReplacement(sb, matchedString);
        sb.append(" ");

    }
    m.appendTail(sb);

    System.out.println(sb.toString());
    System.out.println(extractedString.toString());
}
Output :
String buffer = AZERTYZA  QWERTY OK NE NO
List of extracted String = [ZA, NE]

Upvotes: 1

dejvuth
dejvuth

Reputation: 7146

An easy fix is too add another group to match a string before <em>:

Pattern p = Pattern.compile("(.*?)<em>(.*?)</em>");

With it, m.group(1) refers to the string outside em, and m.group(2) is the one inside.

Of course, this won't include the last string outside em (NO in your example). So, you might want to memorize the last index where the matching ends with e.g. int end = m.end(), and retrieve it s.substring(end).

Upvotes: 2

ernest_k
ernest_k

Reputation: 45339

String[] pieces = s.split("<.*?>")

This will split the string on anything surrounded by <>. If your tag is always em, then you can use:

String[] pieces = s.split("</?em>")

Upvotes: 0

Related Questions