user2818196
user2818196

Reputation: 69

regex to remove round brackets from a string

i have a string

String s="[[Identity (philosophy)|unique identity]]";

i need to parse it to .

s1 = Identity_philosphy 
s2= unique identity

I have tried following code

Pattern p = Pattern.compile("(\\[\\[)(\\w*?\\s\\(\\w*?\\))(\\s*[|])\\w*(\\]\\])");
  Matcher m = p.matcher(s);
while(m.find())
{
....
}

But the pattern is not matching..

Please Help

Thanks

Upvotes: 6

Views: 668

Answers (2)

Ryszard Czech
Ryszard Czech

Reputation: 18641

Use

String s="[[Identity (philosophy)|unique identity]]";
String[] results = s.replaceAll("^\\Q[[\\E|]]$", "")    // Delete double brackets at start/end
      .replaceAll("\\s+\\(([^()]*)\\)","_$1")           // Replace spaces and parens with _
       .split("\\Q|\\E");                               // Split with pipe
System.out.println(results[0]);
System.out.println(results[1]);

Output:

Identity_philosophy
unique identity

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627468

You may use

String s="[[Identity (philosophy)|unique identity]]";
Matcher m = Pattern.compile("\\[{2}(.*)\\|(.*)]]").matcher(s);
if (m.matches()) {
    System.out.println(m.group(1).replaceAll("\\W+", " ").trim().replace(" ", "_")); // // => Identity_philosphy
    System.out.println(m.group(2).trim()); // => unique identity
}

See a Java demo.

Details

The "\\[{2}(.*)\\|(.*)]]" with matches() is parsed as a ^\[{2}(.*)\|(.*)]]\z pattern that matches a string that starts with [[, then matches and captures any 0 or more chars other than line break chars as many as possible into Group 1, then matches a |, then matches and capture any 0 or more chars other than line break chars as many as possible into Group 2 and then matches ]]. See the regex demo.

The contents in Group 2 can be trimmed from whitespace and used as is, but Group 1 should be preprocessed by replacing all 1+ non-word character chhunks with a space (.replaceAll("\\W+", " ")), then trimming the result (.trim()) and replacing all spaces with _ (.replace(" ", "_")) as the final touch.

Upvotes: 0

Related Questions