Reputation: 66590
I want to get the array of arguments so I can use it with optparse-js library so If I have something like
-f foo -b -a -z baz bar
I want array like this
["-f", "foo", "-b", "-a", "-z", "baz", "bar"]
it should work with strings that have escape quote inside and long GNU options. So far I have regex that match the string
/("(?:\\"|[^"])*"|'(?:\\'|[^'])*')/g
it match strings like "das"
or "asd\"asd"
or 'asd'
or 'sad\'asd'
Can I use regex for this or do I need a parser (like using PEG) it would be nice if it match regex to so I can do
-p "hello b\"ar baz" -f /^ [^ ]+ $/
UPDATE: with help from @Damask I've created this regex:
/('(\\'|[^'])*'|"(\\"|[^"])*"|\/(\\\/|[^\/])*\/|(\\ |[^ ])+|[\w-]+)/g
it work for strings like this:
echo -p "hello b\"ar baz" -f /^ [^ ]+ $/
it return
['echo', '-p', '"hello b\"ar baz"', '-f', '/^ [^ ]+ $/']
but if fail on strings like this:
echo "©\\\\" abc "baz"
it match command and two arguments instead of 3 arguments demo
if argument don't have spaces like "foo"baz it should be one item in array, quotes need to be included but I will remove not escaped ones from string (like in bash when you execute echo "foo"bar
echo will get one foobar argument).
Upvotes: 6
Views: 5892
Reputation: 3627
I really love regex but sometimes a combination of simple regex and simple function does the same job but is a lot easier to debug and maintain, especially when developers not familiar with (complex) regex join the project.
So here is another approach, see explanation below.
It's tested using this rather complicated sample with arguments containing many spaces or escaped double quotes as required :
echo "©\\\\" abc "baz" "foo bar dummy" -d "marty \\\"mc fly" -f "avb eer\"" -p 2 "asd\"asd" -a 3
Code Snippet
function commandArgs2Array(text) {
const re = /^"[^"]*"$/; // Check if argument is surrounded with double-quotes
const re2 = /^([^"]|[^"].*?[^"])$/; // Check if argument is NOT surrounded with double-quotes
let arr = [];
let argPart = null;
text && text.split(" ").forEach(function(arg) {
if ((re.test(arg) || re2.test(arg)) && !argPart) {
arr.push(arg);
} else {
argPart = argPart ? argPart + " " + arg : arg;
// If part is complete (ends with a double quote), we can add it to the array
if (/"$/.test(argPart)) {
arr.push(argPart);
argPart = null;
}
}
});
return arr;
}
let result = commandArgs2Array('echo "©\\\\" abc "baz" "foo bar dummy" -d "marty \\\"mc fly" -f "avb eer\"" -p 2 "asd\"asd" -a 3');
console.log(result);
Explanation
First, arguments are splitted using space char.
For each argument, we check if it's a complete or an incomplete argument
A complete argument is an argument which is either
- surrounded with double-quotes
- NOT surrounded with double-quotes at all
Every other case represents an incomplete argument. It's either
- The start of an incomplete argument (starts with a double-quote)
- A space
- A part of an incomplete argument which can contain escaped double-quotes
- The end of an incomplete argument (ends with a double-quote)
That's all folks !
Upvotes: 5
Reputation:
Some comments:
The raw regex for quotes is this
"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'
Example: http://regex101.com/r/uxqApc/2
This part (?= :? | $ )
will always resolve to true, and is useless
This part /(\\/|[^/])+/[gimy]*
if this is a regex (or any delimited item)
you have to blindly handle escape anything. Like this /[^/\\]*(?:\\[\S\s][^/\\]*)*/[gimy]*
.
Otherwise it would match /..\\//
which is not correct.
This expression (?: \\ \s | \S )+
is first in the alternation sequence, i.e. before this one [\w-]+
. Since not whitespace \S
is a superset of [\w-]
, it means this [\w-]+
never, ever get's reached.
Making the corrections and putting it all back together gets this regex:
/("[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|\/[^\/\\]*(?:\\[\S\s][^\/\\]*)*\/[gimy]*(?=\s|$)|(?:\\\s|\S)+)/
Demo's:
JavaScript - http://regex101.com/r/cuJuQ8/1
PCRE - http://regex101.com/r/cuJuQ8/2
( # (1 start)
"
[^"\\]*
(?: \\ [\S\s] [^"\\]* )*
"
|
'
[^'\\]*
(?: \\ [\S\s] [^'\\]* )*
'
|
/
[^/\\]*
(?: \\ [\S\s] [^/\\]* )*
/
[gimy]*
(?= \s | $ )
|
(?: \\ \s | \S )+
) # (1 end)
If also, you need to parse this like the space (outside of quotes or regex) is a delimiter as well, this would be it:
/((?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|\/[^\/\\]*(?:\\[\S\s][^\/\\]*)*\/[gimy]*(?=\s|$)|(?:\\\s|\S))+)(?=\s|$)/
Demo's:
JavaScript - http://regex101.com/r/cuJuQ8/3
PCRE - https://regex101.com/r/cuJuQ8/4
Formatted
( # (1 start)
(?:
"
[^"\\]*
(?: \\ [\S\s] [^"\\]* )*
"
|
'
[^'\\]*
(?: \\ [\S\s] [^'\\]* )*
'
|
/
[^/\\]*
(?: \\ [\S\s] [^/\\]* )*
/
[gimy]*
(?= \s | $ )
|
(?: \\ \s | \S )
)+
) # (1 end)
(?= \s | $ )
Upvotes: 6
Reputation: 48
var string = "-f foo -b -a -z baz bar";
string = string.split(" ");
var stringArray = new Array();
for(var i =0; i < string.length; i++){
stringArray.push(string[i]);
}
console.log(stringArray);
output will be console like this
Array [ "-f", "foo", "-b", "-a", "-z", "baz", "bar" ]
Upvotes: 0
Reputation: 66590
Ok, even that I created a Bounty for this question I found the answer with help from Regex match even number of letters
and my regex look like this:
/('((?:[^\\]*(?:\\\\)*\\')+|[^']*)*'|"(?:(?:[^\\]*(?:\\\\)*\\")+|[^"]*)*"|(?:\/(\\\/|[^\/])+\/[gimy]*)(?=:? |$)|(\\\s|\S)+|[\w-]+)/
EDIT: @sin suggesion make better regex:
/("[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?:\/(\\\/|[^\/])+\/[gimy]*)(?=:? |$)|(\\\s|\S)+|[\w-]+)/
Upvotes: -1
Reputation: 72927
This will work:
var input = '-p "hello b\"ar baz" -f /^ [^ ]+ $/ -c -d -e'
var arr = input.split(' -');
var out = [];
for(var i = 0; i < arr.length; i++){
if(~arr[i].indexOf(' ')){
out = out.concat([arr[i].substring(0, arr[i].indexOf(' ')), arr[i].substring(arr[i].indexOf(' ')+1)])
}else{
out = out.concat('-'+arr[i]);
}
}
Output:
["-p", ""hello b"ar baz"", "f", "/^ [^ ]+ $/", "-c", "-d", "-e"]
I know it's not a fancy 1-line regex, but it works like expected.
Upvotes: 0
Reputation: 1254
Try this:
var a = '-f foo "ds df s\\" da" -b -a -z baz bar';
a.match(/([\w-]+|"(\\"|[^"])*")/g)
returns [ "-f", "foo", ""ds df s\" da"", "-b", "-a", "-z", "baz", "bar"]
Upvotes: 0
Reputation: 11958
why don't you simply use split function?
var arr = myString.split(/\s+/);
you better pass a regexp as argument to avoid bugs in cases when separator is \t
or there are multiple spaces etc.
EDIT:
if your arguments have spaces and are in quote marks, I think you can't find a single regexp. Think you should find arguments with spaces at first (/"(.*?)"/
in group 1 you'll get argument), add them to array, then remove them from string and only after that use split method like described above.
Upvotes: 2