Reputation: 945
I have a String like this
String s = "AZERTY<em>ZA</em> QWERTY OK <em>NE</em>NO ;
I want extract strings between and and construct a StringBuilder with all parts of the string in right order. I do this because i need to identify and localize the strings extracted but i need to keep the entire string too. The purpose for all this work is to add later the entire String in a excel sheet cell and add font for the string between
XSSFRichTextString xssfrt = new XSSFRichTextString(); // acts like a StringBuilder
xssfrt .append("AZERTY");
xssfrt .append("ZA" , font); //extract 1
xssfrt .append(" QWERTY OK "); // keep spaces
xssfrt .append("NE" , font); //extract 2
xssfrt .append("NO");
There is my regex which can extract the desired strings but i don't know how to construct the StringBuilder with all parts in right order :/
Pattern p = Pattern.compile("\\<em>(.*?)\\</em>");
Matcher m = p.matcher(value);
while(m.find())
{
m.group(1); //extracts
}
Thank you very much
Upvotes: -1
Views: 93
Reputation: 2691
You need to do something like as :
String str = "AZERTY<em>ZA</em> QWERTY OK <em>NE</em>NO";
StringBuilder stringBuilder = new StringBuilder();
String[] parts = str.split("(<\\/?em>)");
System.out.println("parts : "+Arrays.toString(parts));
for(String s:parts){
System.out.println("Part going to append :"+s);
stringBuilder.append(s);
}
System.out.println("StringBuilder : "+stringBuilder.toString());
}
Out put will be:
> parts : [AZERTY, ZA, QWERTY OK , NE, NO] Part going to append :AZERTY
> Part going to append :ZA Part going to append : QWERTY OK Part going
> to append :NE Part going to append :NO StringBuilder : AZERTYZA QWERTY
> OK NENO
UPDATES :--
Check the updated code:
String str = "AZERTY<em>ZA</em> QWERTY OK <em>NE</em>NO";
//replace word in string which is preceded by <\em> to word:font eg. ZA:font
str = str.replaceAll("(\\w+)(?=\\<\\/em\\>)", "$1:font");
// After replace :AZERTY<em>ZA:font</em> QWERTY OK <em>NE:font</em>NO
String[] parts = str.split("(<\\/?em>)");
// After split : [AZERTY, ZA:font, QWERTY OK , NE:font, NO]
XSSFRichTextString xssfrt = new XSSFRichTextString();
for(String s:parts){
//set font according to replace string
if(s.contains(":")){
String[] subParts = s.split(":");
xssfrt.append(subParts[0], /**check the subParts[0] and set the font***/ );
}else{
xssfrt.append(s);
}
}
}
Upvotes: 0
Reputation: 901
You can use Matcher's appendReplacement(StringBuffer sb, String replacement) and appendTail(StringBuffer sb) function to keep it in order. And have a list which will store the extracted Strings. Something like this
public static void main(String[] args) throws java.lang.Exception {
String s = "AZERTY<em>ZA</em> QWERTY OK <em>NE</em>NO";
String matchedString = null;
List<String> extractedString = new ArrayList<String>();
Pattern p = Pattern.compile("\\<em>(.*?)\\</em>");
Matcher m = p.matcher(s);
StringBuffer sb = new StringBuffer();
while (m.find()) {
matchedString = m.group(1);
extractedString.add(matchedString);
m.appendReplacement(sb, matchedString);
sb.append(" ");
}
m.appendTail(sb);
System.out.println(sb.toString());
System.out.println(extractedString.toString());
}
Output :
String buffer = AZERTYZA QWERTY OK NE NO
List of extracted String = [ZA, NE]
Upvotes: 1
Reputation: 7146
An easy fix is too add another group to match a string before <em>
:
Pattern p = Pattern.compile("(.*?)<em>(.*?)</em>");
With it, m.group(1)
refers to the string outside em
, and m.group(2)
is the one inside.
Of course, this won't include the last string outside em
(NO
in your example). So, you might want to memorize the last index where the matching ends with e.g. int end = m.end()
, and retrieve it s.substring(end)
.
Upvotes: 2
Reputation: 45339
String[] pieces = s.split("<.*?>")
This will split the string on anything surrounded by <>
.
If your tag is always em
, then you can use:
String[] pieces = s.split("</?em>")
Upvotes: 0