Lumaskcete
Lumaskcete

Reputation: 710

how to split a string which has multiple repeated keywords in it to an array in javascript?

I has a string like this:

const string =  'John Smith: I want to buy 100 apples\r\nI want to buy 200 oranges\r\n, and add 300 apples';

and now I want to split the string by following keywords:

const keywords = ['John smith',  '100', 'apples', '200', 'oranges', '300'];

now I want to get result like this:

const result = [
  {isKeyword: true, text: 'John Smith'},
  {isKeyword: false, text: 'I want to buy '}, 
  {isKeyword: true, text: '100'}, 
  {isKeyword: true, text:'apples'}, 
  {isKeyword: false, text:'\r\nI want to buy'}, 
  {isKeyword: true, text:'200'},
  {isKeyword: true, text:'oranges'}, 
  {isKeyword: false, text:'\r\n, and add'},
  {isKeyword: true, text:'300'},
  {isKeyword: true, text:'apples'}];

Keywords could be lowercase or uppercase, I want to keep the string in array just the same as string.

I also want to keep the array order as the same as the string but identify the string piece in array whether it is a keyword.

How could I get it?

Upvotes: 0

Views: 120

Answers (2)

Mark
Mark

Reputation: 92440

I would start by finding the indexes of all your keywords. From this you can make you can know where all the keywords in the sentence start and stop. You can sort this by the index of where the keyword starts.

Then it's just a matter of taking substrings up to the start of the keywords -- these will be the keyword: false substrings, then add the keyword substring. Repeat until you are done.

const string = 'John Smith: I want to buy 100 apples\r\nI want to buy 200 oranges\r\n, and add 300 apples Thanks';
const keywords = ['John smith', '100', 'apples', '200', 'oranges', '300'];

// find all indexes of a keyword
function getInd(kw, arr) {
  let regex = new RegExp(kw, 'gi'), result, pos = []

  while ((result = regex.exec(string)) != null)
    pos.push([result.index, result.index + kw.length]);
  return pos
}

// find all index of all keywords
let positions = keywords.reduce((a, word) => a.concat(getInd(word, string)), [])
positions.sort((a, b) => a[0] - b[0])

// go through the string and make the array
let start = 0, res = []

for (let next of positions) {
  if (start + 1 < next[0])
    res.push({ isKeyword: false,text: string.slice(start, next[0]).trim()})

  res.push({isKeyword: true, text: string.slice(next[0], next[1])})
  start = next[1]

}
// get any remaining text
if (start < string.length) res.push({isKeyword: false, text: string.slice(start, string.length).trim()})


console.log(res)

I'm trimming whitespace as I go, but you may want to do something different.

If you are willing to pick a delimiter


Here's a much more succinct way to do this if you are willing to pick a set of delimiters that can't appear in your text for example, use {} below

Here we simply wrap the keywords with the delimiter and then split them out. Grabbing the keyword with the delimiter makes it easy to tell which parts of the split are your keywords:

const string =  'John Smith: I want to buy 100 apples\r\nI want to buy 200 oranges\r\n, and add 300 apples Thanks';
const keywords = ['John smith',  '100', 'apples', '200', 'oranges', '300'];

let res = keywords.reduce((str, k ) => str.replace(new RegExp(`(${k})`, 'ig'), '{$1}'), string)
          .split(/({.*?})/).filter(i => i.trim())
          .map(s =>  s.startsWith('{') 
            ? {iskeyword: true, text: s.slice(1, s.length -1)}
            : {iskeyword: false, text: s.trim()})
            
console.log(res)

Upvotes: 2

QuentinUK
QuentinUK

Reputation: 3077

Use a regular expression

rx = new RegExp('('+keywords.join('|')+')')

thus

str.split(rx)

Upvotes: 0

Related Questions