joshua miller
joshua miller

Reputation: 1756

Efficient string manipulation in Javascript

I have a string (HTML content) and an array of position (index) objects. The string length is about 1.6 million characters and there are about 700 position objects.

ie:

var content = "<html><body><div class="c1">this is some text</div>...."
var positions = [{start: 20, end: 25}, {start: 35, end: 37}....]

I have to insert an opening span tag into every start position within the string and a close span tag into every end position within the string.

What is the most efficient way to do this?

So far I have tried sorting the positions array in reverse, then looping through and then using replace / splice to insert the tags, eg:

content = content.slice(0, endPosition) + "</span>" + content.substring(endPosition);
content = content.slice(0, startPosition) + "<span>" + content.slice(startPosition);

(Notice how I have started the loop from the end in order to avoid messing up the start/end positions).

But this takes about 3 seconds, which seems slow and inefficient to me.

What is a more efficient way to do this?

Upvotes: 7

Views: 1412

Answers (4)

Arvind
Arvind

Reputation: 1016

You can do this :

const content = 'this is some text. this is some text. this is some text. this is some text. this is some text. this is some text. this is some text. this is some text. ';
const positions = [{start: 20, end: 26}, {start: 35, end: 37}];

// using Set will help in reducing duplicate position elements.
let starts = new Set();
let ends = new Set();

const START_TAG = '<span>';
const END_TAG = '</span>';

const string_length = content.length;

positions.forEach(function(position) {
   let _start = position.start;
   let _end = position.end;

   // check whether index positions are in-bound.
   if(_start > -1 && _start < string_length) starts.add(_start);
   if(_end > -1 && _end < string_length) ends.add(_end);
});

updated_string = content;

starts.forEach(function(position) {
  updated_string = updated_string.substr(0, position) + START_TAG + updated_string.substr(position);
});

ends.forEach(function(position) {
  updated_string = updated_string.substr(0, position) + END_TAG + updated_string.substr(position);
});

console.log(updated_string);

Upvotes: 1

Yosvel Quintero
Yosvel Quintero

Reputation: 19070

You can do:

const content = '<div class="c1">It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using Content here, content here, making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for lorem ipsum will uncover many web sites still in their infancy. Various versions have evolved over the years, sometimes by accident, sometimes on purpose (injected humour and the like).</div>';
const positions = [{start: 24,end: 40}, {start: 160,end: 202}];
const result = positions
  .reduce((a, c, i, loopArray) => {
    a.array.push(
      content.slice(a.lastPosition, c.start), '<span class="blue">', content.slice(c.start, c.end), '</span>'
    );
    
    a.lastPosition = c.end;
    
    if (loopArray.length === ++i) {
      a.array.push(content.slice(a.lastPosition));
    }
    
    return a;
  }, {array: [], lastPosition: 0})
  .array
  .join('');

document.write(result);
.blue {color: blue;}

Upvotes: 1

qiAlex
qiAlex

Reputation: 4346

We can split content by chars into array, than did one loop to insert <span> </span> and than join back to string

var content = '<html><body><div class="c1">this is some text</div>....';
var positions = [{start: 20, end: 25}, {start: 35, end: 37}];
var arr = content.split('');

var arrPositions = {
  starts: positions.map(_ => _.start),
  ends: positions.map(_ => _.end)
}

var result = arr.map((char, i) => {
  if (arrPositions.starts.indexOf(i) > -1) {
    return '<span>' + char;
  }
  if (arrPositions.ends.indexOf(i) > -1) {
    return '</span>' + char;
  }
  return char
}).join('')

console.log(result)

Upvotes: 1

georg
georg

Reputation: 214949

Instead of modifying the big string each time, try accumulating processed "chunks" in a new buffer:

content = '0123456789'
positions = [
  [1, 3],
  [5, 7]
]

buf = []
lastPos = 0

for (let [s, e] of positions) {
  buf.push(
    content.slice(lastPos, s),
    '<SPAN>',
    content.slice(s, e),
    '</SPAN>'
  )
  lastPos = e
}

buf.push(content.slice(lastPos))


res = buf.join('')
console.log(res)

Upvotes: 4

Related Questions