Reputation: 69
i have a string
String s="[[Identity (philosophy)|unique identity]]";
i need to parse it to .
s1 = Identity_philosphy
s2= unique identity
I have tried following code
Pattern p = Pattern.compile("(\\[\\[)(\\w*?\\s\\(\\w*?\\))(\\s*[|])\\w*(\\]\\])");
Matcher m = p.matcher(s);
while(m.find())
{
....
}
But the pattern is not matching..
Please Help
Thanks
Upvotes: 6
Views: 668
Reputation: 18641
Use
String s="[[Identity (philosophy)|unique identity]]";
String[] results = s.replaceAll("^\\Q[[\\E|]]$", "") // Delete double brackets at start/end
.replaceAll("\\s+\\(([^()]*)\\)","_$1") // Replace spaces and parens with _
.split("\\Q|\\E"); // Split with pipe
System.out.println(results[0]);
System.out.println(results[1]);
Output:
Identity_philosophy
unique identity
Upvotes: 1
Reputation: 627468
You may use
String s="[[Identity (philosophy)|unique identity]]";
Matcher m = Pattern.compile("\\[{2}(.*)\\|(.*)]]").matcher(s);
if (m.matches()) {
System.out.println(m.group(1).replaceAll("\\W+", " ").trim().replace(" ", "_")); // // => Identity_philosphy
System.out.println(m.group(2).trim()); // => unique identity
}
See a Java demo.
Details
The "\\[{2}(.*)\\|(.*)]]"
with matches()
is parsed as a ^\[{2}(.*)\|(.*)]]\z
pattern that matches a string that starts with [[
, then matches and captures any 0 or more chars other than line break chars as many as possible into Group 1, then matches a |
, then matches and capture any 0 or more chars other than line break chars as many as possible into Group 2 and then matches ]]
. See the regex demo.
The contents in Group 2 can be trimmed from whitespace and used as is, but Group 1 should be preprocessed by replacing all 1+ non-word character chhunks with a space (.replaceAll("\\W+", " ")
), then trimming the result (.trim()
) and replacing all spaces with _
(.replace(" ", "_")
) as the final touch.
Upvotes: 0