Reputation: 1731
Apologies for my poor understanding on the regex world. I'm trying to split a text using regex. Here's what I'm doing right now. Please consider the following string
String input = "Name:\"John Adam\" languge:\"english\" Date:\" August 2011\"";
Pattern pattern = Pattern.compile(".*?\\:\\\".*?\\\"\\s*");
Matcher matcher = pattern.matcher(input);
List keyValues = new LinkedList();
while(matcher.find()){
System.out.println(matcher.group());
keyValues.add(matcher.group());
}
System.out.println(keyValues);
I get the right output, which is what I'm looking.
Name:"John Adam"
languge:"english"
Date:" August 2011"
Now, I'm struggling to make it a little generic. For e.g. if I add another pattern in the input string. I've added a new value Audience:(user) in a different pattern, i.e. " is replaced by ();
String input = "Name:\"John Adam\" languge:\"english\" Date:\" August 2011\" Audience:(user)";
What'll be the generic pattern for this ? Sorry if this sounds too lame.
Thanks
Upvotes: 1
Views: 126
Reputation: 55720
First of all I should point out that regular expressions are NOT a magic bullet. By that I mean that while they can be incredibly flexible and useful in some cases they don't solve all problems of text matching (for instance parsing XML-like markup)
However, in the example you gave, you could use the |
syntax to specify an alternate pattern to match. An example might be:
Pattern pattern = Pattern.compile(".*?\\:(\\\".*?\\\"|\\(.*?\\))\\s*");
This section in parentheses: (\\\".*?\\\"|\\(.*?\\))
can be thought of as: find a pattern that matches \\\".*?\\\"
or \\(.*?\\)
(and remember what the backslashes mean - they are escape characters.
Note though that this approach, while flexible, requires you to add a specific case quite literally so it's not truly generic in the absolute sense.
NOTE
To better illustrate what I meant by not being able to craft a truly generic solution, here's a more generic pattern that you could use:
Pattern pattern = Pattern.compile(".*?\\:[\\\"(]{1,2}.*?[\\\")]{1,2}\\s*");
The pattern above uses character classes and it's more generic but while it will match your examples, it will also match things like: blah:\stuff\
or blah:"stuff"
or even hybrids like blah:\"stuff)
or worse blah:((stuff""
Upvotes: 1
Reputation: 424983
Step 1: Remove most of those baskslashes - you don't need to escape quotes or colons (they are just another normal character)
Try this pattern:
".*?:[^\\w ].*?[^\\w ]\\s*"
It works for all non-word/space chars being a delimiter, works for your test case, and would work for name:'foo'
etc
Upvotes: 2
Reputation: 124215
You can always use OR operator |
Pattern pattern = Pattern.compile("(.*?\\:\\\".*?\\\"\\s*)|(.*?\\:\\(.*?\\)\\s*)");
Upvotes: 1