Reputation: 7301
Basically, I need to split the string like
"one quoted argument" those are separate arguments "but not \"this one\""
to get in result the list of arguments
This regex "(\"|[^"])*"|[^ ]+
nearly does the job but the issue is that regular expression always (at least in java) tries to match the longest string possible.
In consequence, when I apply the regex to a string that starts and ends with a quoted arguments, it matches the whole string and does not create a group for each argument.
Is there a way to tweak this regex or the matcher or the pattern or whatever to handle that?
Note: don't tell me I could use GetOpt
or CommandLine.parse
or anything else similar.
My concern is about pure java regex (if possible but I doubt it...).
Upvotes: 3
Views: 2929
Reputation: 211
I came up with this one (thanks Alex for giving me the good starting point :))
/**
* Pattern that is capable of dealing with complex command line quoting and
* escaping. This can recognize correctly:
* <ul>
* <li>"double quoted strings"
* <li>'single quoted strings'
* <li>"escaped \"quotes within\" quoted string"
* <li>C:\paths\like\this or "C:\path like\this"
* <li>--arguments=like_this or "--args=like this" or '--args=like this' or
* --args="like this" or --args='like this'
* <li>quoted\ whitespaces\\t (spaces & tabs)
* <li>and probably more :)
* </ul>
*/
private static final Pattern cliCracker = Pattern
.compile(
"[^\\s]*\"(\\\\+\"|[^\"])*?\"|[^\\s]*'(\\\\+'|[^'])*?'|(\\\\\\s|[^\\s])+",
Pattern.MULTILINE);
Upvotes: 2
Reputation: 21
public static String[] parseCommand( String cmd )
{
if( cmd == null || cmd.length() == 0 )
{
return new String[]
{};
}
cmd = cmd.trim();
String regExp = "\"(\\\"|[^\"])*?\"|[^ ]+";
Pattern pattern = Pattern.compile( regExp, Pattern.MULTILINE | Pattern.CASE_INSENSITIVE );
Matcher matcher = pattern.matcher( cmd );
List< String > matches = new ArrayList< String >();
while( matcher.find() ) {
matches.add( matcher.group() );
}
String[] parsedCommand = matches.toArray(new String[] {});
return parsedCommand;
}
Upvotes: 2
Reputation: 25613
You may use the non greedy qualifier *?
to make it work:
"(\\"|[^"])*?"|[^ ]+
See this link for an example in action: http://gskinner.com/RegExr/?32srs
Upvotes: 4
Reputation: 53462
regular expression always (at least in java) tries to match the longest string possible.
Um... no.
That is controlled by if you use greedy or non-greedy expressions. See some examples. Using a non-greedy one (by adding a question mark) should do it. It's called lazy quantification.
The default is greedy, but it certainly doesn't mean it is always that way.
Upvotes: 4