jcubic
jcubic

Reputation: 66590

How to split string into arguments and options in javascript

I want to get the array of arguments so I can use it with optparse-js library so If I have something like

-f foo -b -a -z baz bar

I want array like this

["-f", "foo", "-b", "-a", "-z", "baz", "bar"]

it should work with strings that have escape quote inside and long GNU options. So far I have regex that match the string

/("(?:\\"|[^"])*"|'(?:\\'|[^'])*')/g

it match strings like "das" or "asd\"asd" or 'asd' or 'sad\'asd'

Can I use regex for this or do I need a parser (like using PEG) it would be nice if it match regex to so I can do

-p "hello b\"ar baz" -f /^ [^ ]+ $/

UPDATE: with help from @Damask I've created this regex:

/('(\\'|[^'])*'|"(\\"|[^"])*"|\/(\\\/|[^\/])*\/|(\\ |[^ ])+|[\w-]+)/g

it work for strings like this:

echo -p "hello b\"ar baz" -f /^ [^ ]+ $/

it return

['echo', '-p', '"hello b\"ar baz"', '-f', '/^ [^ ]+ $/']

but if fail on strings like this:

echo "©\\\\" abc "baz"

it match command and two arguments instead of 3 arguments demo

if argument don't have spaces like "foo"baz it should be one item in array, quotes need to be included but I will remove not escaped ones from string (like in bash when you execute echo "foo"bar echo will get one foobar argument).

Upvotes: 6

Views: 5892

Answers (7)

Stephane Janicaud
Stephane Janicaud

Reputation: 3627

I really love regex but sometimes a combination of simple regex and simple function does the same job but is a lot easier to debug and maintain, especially when developers not familiar with (complex) regex join the project.

So here is another approach, see explanation below.

It's tested using this rather complicated sample with arguments containing many spaces or escaped double quotes as required :

echo "©\\\\" abc "baz" "foo bar dummy" -d "marty \\\"mc fly" -f "avb eer\"" -p 2 "asd\"asd" -a 3

Code Snippet

function commandArgs2Array(text) {
  const re = /^"[^"]*"$/; // Check if argument is surrounded with double-quotes
  const re2 = /^([^"]|[^"].*?[^"])$/; // Check if argument is NOT surrounded with double-quotes

  let arr = [];
  let argPart = null;

  text && text.split(" ").forEach(function(arg) {
    if ((re.test(arg) || re2.test(arg)) && !argPart) {
      arr.push(arg);
    } else {
      argPart = argPart ? argPart + " " + arg : arg;
      // If part is complete (ends with a double quote), we can add it to the array
      if (/"$/.test(argPart)) {
        arr.push(argPart);
        argPart = null;
      }
    }
  });

  return arr;
}

let result = commandArgs2Array('echo "©\\\\" abc "baz" "foo bar  dummy" -d "marty \\\"mc fly" -f "avb eer\"" -p 2 "asd\"asd" -a 3');
console.log(result);

Explanation

First, arguments are splitted using space char.

For each argument, we check if it's a complete or an incomplete argument

A complete argument is an argument which is either

  • surrounded with double-quotes
  • NOT surrounded with double-quotes at all

Every other case represents an incomplete argument. It's​ either

  • The start of an incomplete argument (starts with a double-quote)
  • A space
  • A part of an incomplete argument which can contain escaped double-quotes
  • The end of an incomplete argument (ends with a double-quote)

That's all folks !

Upvotes: 5

user557597
user557597

Reputation:

Some comments:

  • The raw regex for quotes is this
    "[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'
    Example: http://regex101.com/r/uxqApc/2

  • This part (?= :? | $ ) will always resolve to true, and is useless

  • This part /(\\/|[^/])+/[gimy]* if this is a regex (or any delimited item)
    you have to blindly handle escape anything. Like this /[^/\\]*(?:\\[\S\s][^/\\]*)*/[gimy]*.
    Otherwise it would match /..\\// which is not correct.

  • This expression (?: \\ \s | \S )+ is first in the alternation sequence, i.e. before this one [\w-]+. Since not whitespace \S is a superset of [\w-], it means this [\w-]+ never, ever get's reached.

Making the corrections and putting it all back together gets this regex:
/("[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|\/[^\/\\]*(?:\\[\S\s][^\/\\]*)*\/[gimy]*(?=\s|$)|(?:\\\s|\S)+)/

Demo's:

JavaScript - http://regex101.com/r/cuJuQ8/1
PCRE - http://regex101.com/r/cuJuQ8/2

Formatted

 (                             # (1 start)
      "
      [^"\\]* 
      (?: \\ [\S\s] [^"\\]* )*
      "
   |  
      ' 
      [^'\\]* 
      (?: \\ [\S\s] [^'\\]* )*
      '
   |  
      / 
      [^/\\]* 
      (?: \\ [\S\s] [^/\\]* )*
      /
      [gimy]* 
      (?= \s | $ )
   |  
      (?: \\ \s | \S )+
 )                             # (1 end)


If also, you need to parse this like the space (outside of quotes or regex) is a delimiter as well, this would be it:

/((?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|\/[^\/\\]*(?:\\[\S\s][^\/\\]*)*\/[gimy]*(?=\s|$)|(?:\\\s|\S))+)(?=\s|$)/

Demo's:

JavaScript - http://regex101.com/r/cuJuQ8/3
PCRE - https://regex101.com/r/cuJuQ8/4

Formatted

 (                             # (1 start)
      (?:
           "
           [^"\\]* 
           (?: \\ [\S\s] [^"\\]* )*
           "
        |  
           ' 
           [^'\\]* 
           (?: \\ [\S\s] [^'\\]* )*
           '
        |  
           / 
           [^/\\]* 
           (?: \\ [\S\s] [^/\\]* )*
           /
           [gimy]* 
           (?= \s | $ )
        |  
           (?: \\ \s | \S )
      )+
 )                             # (1 end)
 (?= \s | $ )

Upvotes: 6

Piyush Kumar
Piyush Kumar

Reputation: 48

 var string = "-f foo -b -a -z baz bar";
        string = string.split(" ");
    var stringArray = new Array();
    for(var i =0; i < string.length; i++){
        stringArray.push(string[i]);
    }
    console.log(stringArray);

output will be console like this

Array [ "-f", "foo", "-b", "-a", "-z", "baz", "bar" ]

Upvotes: 0

jcubic
jcubic

Reputation: 66590

Ok, even that I created a Bounty for this question I found the answer with help from Regex match even number of letters

and my regex look like this:

/('((?:[^\\]*(?:\\\\)*\\')+|[^']*)*'|"(?:(?:[^\\]*(?:\\\\)*\\")+|[^"]*)*"|(?:\/(\\\/|[^\/])+\/[gimy]*)(?=:? |$)|(\\\s|\S)+|[\w-]+)/

with demo

EDIT: @sin suggesion make better regex:

/("[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?:\/(\\\/|[^\/])+\/[gimy]*)(?=:? |$)|(\\\s|\S)+|[\w-]+)/

Upvotes: -1

Cerbrus
Cerbrus

Reputation: 72927

This will work:

var input = '-p "hello b\"ar baz" -f /^ [^ ]+ $/ -c -d -e'
var arr = input.split(' -');
var out = [];
for(var i = 0; i < arr.length; i++){
    if(~arr[i].indexOf(' ')){
        out = out.concat([arr[i].substring(0, arr[i].indexOf(' ')), arr[i].substring(arr[i].indexOf(' ')+1)])
    }else{
        out = out.concat('-'+arr[i]);
    }
}

Output:

["-p", ""hello b"ar baz"", "f", "/^ [^ ]+ $/", "-c", "-d", "-e"]

I know it's not a fancy 1-line regex, but it works like expected.

Upvotes: 0

Damask
Damask

Reputation: 1254

Try this:

var a = '-f foo "ds  df s\\" da" -b -a -z baz bar';
a.match(/([\w-]+|"(\\"|[^"])*")/g)

returns [ "-f", "foo", ""ds df s\" da"", "-b", "-a", "-z", "baz", "bar"]

Upvotes: 0

shift66
shift66

Reputation: 11958

why don't you simply use split function?

var arr = myString.split(/\s+/);

you better pass a regexp as argument to avoid bugs in cases when separator is \t or there are multiple spaces etc.

EDIT:

if your arguments have spaces and are in quote marks, I think you can't find a single regexp. Think you should find arguments with spaces at first (/"(.*?)"/ in group 1 you'll get argument), add them to array, then remove them from string and only after that use split method like described above.

Upvotes: 2

Related Questions