Reputation: 73
How to split the below string
var test = 'sample "test""test2" "test3\\"" sample2"last';
into an array ['sample','"test"','"test2"','"test3\\""','sample2"last']
using javascript regx ?
Some sample input and expected output are added below.
sample1 : ' test1 "test2" test3 "test four\\"" test" d'
output [' test1','"test2"','test3','"test four\\""','test" d']
sample2 : ' test1 test2'
output [' test1 test2']
sample3 : ' test1 "sub test2'
output [' test1 "sub test2']
sample4 : ' test1 "sub test2"'
output [' test1 ','"sub test2"']
sample5 : ' "test1" "sub test2" here'
output ['"test1"','"sub test2"', 'here']
Upvotes: 0
Views: 1204
Reputation: 1
You can use https://www.npmjs.com/package/dqtokenizer
const dqtokenizer = require('dqtokenizer');
const testTokenize = (str, options) => {
const tokens = dqtokenizer.tokenize(str, options);
console.log();
console.log(`str: ${str}`);
console.log(`tokens:`);
tokens.forEach((token, index) => console.log(`\t${index}: ${token}`));
}
const sample1 = ' test1 "test2" test3 "test four\\"" test" d';
// output [' test1','"test2"','test3','"test four\\""','test" d']
testTokenize(sample1);
const sample2 = ' test1 test2'
// output [' test1 test2']
testTokenize(sample2);
const sample3 = ' test1 "sub test2'
// output [' test1 "sub test2']
testTokenize(sample3);
const sample4 = ' test1 "sub test2"'
// output [' test1 ','"sub test2"']
testTokenize(sample4);
const sample5 = ' "test1" "sub test2" here'
// output ['"test1"','"sub test2"', 'here']
testTokenize(sample5);
Output:
str: test1 "test2" test3 "test four\"" test" d
tokens:
0: test1
1: "test2"
2: test3
3: "test four\""
4: test
5: " d
str: test1 test2
tokens:
0: test1
1: test2
str: test1 "sub test2
tokens:
0: test1
1: "sub test2
str: test1 "sub test2"
tokens:
0: test1
1: "sub test2"
str: "test1" "sub test2" here
tokens:
0: "test1"
1: "sub test2"
2: here
Upvotes: 0
Reputation: 785098
This regex should work for you for splitting:
/\s*"[^"\\]*(?:\\.[^"\\]*)*"\s*|.+?(?="[^"\\]*(?:\\.[^"\\]*)*"|$)/g
Code:
var input = [` test1 "test2" test3 "test four\\"" test" d`, ` test1 test2`, ` test1 "sub test2`, `' test1 "sub test2"`, ` "test1" "sub test2" here`];
const re = /\s*"[^"\\]*(?:\\.[^"\\]*)*"\s*|.+?(?="[^"\\]*(?:\\.[^"\\]*)*"|$)/g;
input.forEach(el => {
console.log('<<', el, '>>');
var arr = el.match(re);
arr.forEach(i => console.log(i));
});
RegEx Details:
"[^"\\]*(?:\\.[^"\\]*)*"
: Match a quoted string ignoring escaped quotes|
: OR.+?(?="[^"\\]*(?:\\.[^"\\]*)*"|$)
: Match 1+ any characters that must be followed by a quoted string or end of line.Upvotes: 1
Reputation: 257
A pure regexp solution : / +|(?<!\\")(?<=")(?=")/
This matches either space(s), or empty strings that are
"
but not \"
"
var test = 'sample "test""test2" "test3\\"" sample2"last';
console.log(test.split(/ +|(?<!\\")(?<=")(?=")/));
Upvotes: 0
Reputation: 28196
A little convoluted, but it does the job:
x
"
-enclosed string parts using the RegExp.exec()
method repeatedlyvar test = 'sample "test""test2" "test3\\"" sample2"';
var x='@#@',xr= RegExp(x,'g');
var rx=/"[^"]+"/g; // matches "-enclosed strings
var a,arr=[];
while (a=rx.exec(test.replace(/\\"/g,x)))
arr.push(a[0].replace(/"/g,'').replace(xr,'"'));
console.log(arr);
Upvotes: 0
Reputation: 37367
If you can use negative lookbehind, you can use this pattern:
test.split(/(?<!\\)"/).map(i => i.trim()).filter(i => i != '')
Note, that negative lookbehind is recent addition to JS engines. It can be used with V8, which is used for example in Chrome.
If you are not able to use negative lookbehind, then use workaround: reverse the string, use negative lookahead, then reverse again:
test
.split('')
.reverse()
.join('')
.split(/"(?!\\)/)
.map(i => i.trim())
.filter(i => i != '')
.map(i => i.split('').reverse().join(''))
.reverse()
Patterns used:
"(?!\\)
- negative lookahead: match "
which is not followed by \
(?<!\\)"
- negative lookbehind: match "
which is not preceeded by \
Upvotes: 0
Reputation: 12717
You can split the string by non alphanumeric characters, then remove any element with 0 length.
var test = 'sample "test""test2" "test3\"" sample2"';
var array = test.split(/\W/g).filter(e => e.length>0);
console.log(array);
Upvotes: 2