Reputation: 8933
I want to write a regex that can remove the brackets surrounding [cent]
String input1 = "this is a [cent] and [cent] string"
String output1 = "this is a cent and cent string"
But if it is nested like:
String input2="this is a [cent[cent] and [cent]cent] string"
String output2="this is a cent[cent and cent]cent string"
I can only use replaceAll on the string so, how do I create the pattern in the code below ? and what should the replacement string be ?
Pattern rulerPattern1 = Pattern.compile("", Pattern.MULTILINE);
System.out.println(rulerPattern1.matcher(input1).replaceAll(""));
Update: nested brackets are well-formed and can be only two levels deep, like in case 2.
Edit:
If this is the string "[<centd>[</centd>]purposes[<centd>]</centd>]"
; then OUPTUT should be <centd>[</centd> purposes <centd>]</centd>
.. basically if the brackets is between centd begin and end leave it there or else remove
Upvotes: 1
Views: 1395
Reputation: 15010
This regex would replace the brackets based on having space on only one side of the bracket.
regex: (?<=\s)[\[\]](?=\S)|(?<=\S)[\[\]](?=\s)
replace with empty string
this is a [cent[cent] and [cent]cent] string
this is a cent[cent and cent]cent string
this is a [cent[cent] and [cent]cent] string
this is a cent[cent and cent]cent string
[<cent>[</cent>] and [<cent>]Chemotherapy services.</cent>]
[<cent>[</cent> and <cent>]Chemotherapy services.</cent>]
To address the edit on the question this expression will find:
[<centd>[</centd>]
and replaces them with <centd>[</centd>
[<centd>]
or [</centd>]
, and removes just the outer square bracketsregex: \[(<centd>[\[\]]<\/centd>)\]|\[(<\/?centd>)\]
replace with: $1$2
[<centd>[</centd>]purposes[<centd>]</centd>]
<centd>[</centd>pur [T] poses<centd>]</centd>
Upvotes: 6
Reputation: 22392
You can use java matcher to transform brackets. I did the one for you below:
String input = "this is a [cent[cent] and [cent]cent] string";
Pattern p = Pattern.compile("\\[((?:[^\\[\\]]++|\\[[^\\[\\]]*+\\])*+)\\]");
Matcher m = p.matcher(input);
Upvotes: -1
Reputation: 56829
From the question, the assumption is that there are no more than 2 levels of nesting brackets. It is also assumed that the brackets are balanced.
I further makes the assumption that you don't allow escaping of []
.
I also assume that when there are nested brackets, only the first opening [
and the last closing ]
brackets of the inner brackets are preserved. The rest, i.e. the top level brackets and the rest of the inner brackets are removed.
For example:
only[single] [level] outside[text more [text] some [text]moreeven[more]text[bracketed]] still outside
After replacement will become:
onlysingle level outsidetext more [text some textmoreevenmoretextbracketed] still outside
Aside from the assumptions above, there is no other assumption.
If you can make the assumption about spacing before and after brackets, then you can use the simpler solution by Denomales. Otherwise, my solution below will work without such assumption.
private static String replaceBracket(String input) {
// Search for singly and doubly bracketed text
Pattern p = Pattern.compile("\\[((?:[^\\[\\]]++|\\[[^\\[\\]]*+\\])*+)\\]");
Matcher matcher = p.matcher(input);
StringBuffer output = new StringBuffer(input.length());
while (matcher.find()) {
// Take the text inside the outer most bracket
String innerText = matcher.group(1);
int startIndex = innerText.indexOf("[");
int endIndex;
String replacement;
if (startIndex != -1) {
// 2 levels of nesting
endIndex = innerText.lastIndexOf("]");
// Remove all [] except for first [ and last ]
replacement =
// Text before and including first [
innerText.substring(0, startIndex + 1) +
// Text inbetween, stripped of all the brackets []
innerText.substring(startIndex + 1, endIndex).replaceAll("[\\[\\]]", "") +
// Text after and including last ]
innerText.substring(endIndex);
} else {
// No nesting
replacement = innerText;
}
matcher.appendReplacement(output, replacement);
}
matcher.appendTail(output);
return output.toString();
}
The only thing that is worth explaining here is the regex. The rest you can check out the documentation of Matcher class.
"\\[((?:[^\\[\\]]++|\\[[^\\[\\]]*+\\])*+)\\]"
In RAW form (when you print out the string):
\[((?:[^\[\]]++|\[[^\[\]]*+\])*+)\]
Let us break it up (spaces are insignificant):
\[ # Outermost opening bracket
( # Capturing group 1
(?:
[^\[\]]++ # Text that doesn't contain []
| # OR
\[[^\[\]]*+\] # A nested bracket containing text without []
)*+
) # End of capturing group 1
\] # Outermost closing bracket
I used possessive quantifiers *+
and ++
in order to prevent backtracking by the regex engine. The version with normal greedy quantifier \[((?:[^\[\]]+|\[[^\[\]]*\])*)\]
would still work, but will be slightly inefficient and can cause a StackOverflowError
on big enough input.
Upvotes: 0
Reputation: 48444
If it's really only about finding brackets surrounding "cent", you could use the following approach (with lookbehind, lookahead):
Edited to leave some of the brackets as per expected output: this is now a combination of positive and negative lookbehinds and lookaheads. In other words, it's unlikely that regex is the solution, but does work with the literals provided and then some.
// surrounding
String test1 = "this is a [cent] and [cent] string";
// pseudo-nested
String test2 = "this is a [cent[cent] and [cent]cent] string";
// nested
String test3 = "this is a [cent[cent]] and [cent]cent]] string";
Pattern pattern = Pattern.compile("((?<!cent)\\[+(?=cent))|((?<=cent)\\]+(?!cent))");
Matcher matcher = pattern.matcher(test1);
if (matcher.find()) {
System.out.println(matcher.replaceAll(""));
}
matcher = pattern.matcher(test2);
if (matcher.find()) {
System.out.println(matcher.replaceAll(""));
}
matcher = pattern.matcher(test3);
if (matcher.find()) {
System.out.println(matcher.replaceAll(""));
}
Output:
this is a cent and cent string
this is a cent[cent and cent]cent string
this is a cent[cent and cent]cent string
Upvotes: 0
Reputation: 40904
Regular expressions are unfit for the purpose in general case. Nested structures is a recursive grammar, not a regular grammar. (That's why you don't parse HTML with regular expressions, BTW.)
If you only have a limited depth of bracket nesting, you can write a regular expression for that. Buy you need to state your nesting depth first, and the regexp will not be all that pretty.
Upvotes: 0