Reputation: 29071
I need to parse strings intended for cross-spawn
From the following strings:
cmd foo bar
cmd "foo bar" --baz boom
cmd "baz \"boo\" bam"
cmd "foo 'bar bud' jim" jam
FOO=bar cmd baz
To an object:
{command: 'cmd', args: ['foo', 'bar']}
{command: 'cmd', args: ['foo bar', '--baz', 'boom']}
{command: 'cmd', args: ['baz "boo" bam']}
{command: 'cmd', args: ['foo \'bar bud\' jim', 'jam']}
{command: 'cmd', args: ['baz'], env: {FOO: 'bar'}}
I'm thinking a regex would be possible, but I'd love to avoid writing something custom. Anyone know of anything existing that could do this?
The question and answers are still valuable, but for my specific use-case I no longer need to do this. I'll use spawn-command
instead (more accurately, I'll use spawn-command-with-kill
) which doesn't require the command
and args
to be separate. This will make life much easier for me. Thanks!
Upvotes: 10
Views: 4022
Reputation: 1812
A regular expression could match your command line...
^\s*(?:((?:(?:"(?:\\.|[^"])*")|(?:'[^']*')|(?:\\.)|\S)+)\s*)$
... but you wouldn't be able to extract individual words. Instead, you need to match the next word and accumulate it into a command line.
function parse_cmdline(cmdline) {
var re_next_arg = /^\s*((?:(?:"(?:\\.|[^"])*")|(?:'[^']*')|\\.|\S)+)\s*(.*)$/;
var next_arg = ['', '', cmdline];
var args = [];
while (next_arg = re_next_arg.exec(next_arg[2])) {
var quoted_arg = next_arg[1];
var unquoted_arg = "";
while (quoted_arg.length > 0) {
if (/^"/.test(quoted_arg)) {
var quoted_part = /^"((?:\\.|[^"])*)"(.*)$/.exec(quoted_arg);
unquoted_arg += quoted_part[1].replace(/\\(.)/g, "$1");
quoted_arg = quoted_part[2];
} else if (/^'/.test(quoted_arg)) {
var quoted_part = /^'([^']*)'(.*)$/.exec(quoted_arg);
unquoted_arg += quoted_part[1];
quoted_arg = quoted_part[2];
} else if (/^\\/.test(quoted_arg)) {
unquoted_arg += quoted_arg[1];
quoted_arg = quoted_arg.substring(2);
} else {
unquoted_arg += quoted_arg[0];
quoted_arg = quoted_arg.substring(1);
}
}
args[args.length] = unquoted_arg;
}
return args;
}
Upvotes: 2
Reputation: 20378
You could roll your own with regex, but I'd strongly recommend looking at either:
Both are battle-hardened and well supported; minimist gets about 30 million downloads a month while yargs gets nearly half that.
It's very likely you can find a way to use one or the other to get the CLI syntax you want, with the exception of env
support which IMO should be handled separately (I can't imagine why you'd want to be opinionated about environment variables being set as part of the command)
Upvotes: 4
Reputation: 91585
While you could use raw regular expressions, but what you're building is called a tokenizer. The reason you'd want a tokenizer is to handle certain contexts such as strings that contain spaces, which you don't want to split on.
There are existing generic libraries out there specifically designed for doing parsing and tokenization and can handle cases like strings, blocks, etc.
https://www.npmjs.com/package/js-parse
Additionally, most of these command line formats and config file formats already have parsers/tokenizers. You might want to leverage those and then normalize the results from each into your object structure.
Upvotes: 2