JS: shorten search results by the words, which were found

Question

I am searching for the title and description in my application and the task is to shorten search results. For example, if the description is too long (more than 2 lines) the result should be shortened to one or two lines of text with a found word highlighted.

Here's the example from algolia:

Here's what I've tried so far, but it's not working as expected:

  const truncateHighlightedText = (
    sentence,
    searchExpression,
    truncateLength
  ) => {
    const pattern = new RegExp(
      '\b.{1,' +
        truncateLength +
        '}\b' +
        searchExpression +
        '\b.{1,' +
        truncateLength +
        '}\b',
      'i'
    );

    return sentence.match(pattern);
  };


const sentence = "Testsdfgbsegsegsrewgserwfgvsrevfse  ewrwer wergwregew    erwgrewgwerg   erwgwr eerg rg g er egr ew  erger  rtggrt tr ert tr tr tg tgr gtr  gtr egrt rtg trg rg e eg";
const searchExpression = "egseg";
const truncateLength = 20;


const result = truncateHighlightedText(sentence, searchExpression, truncateLength);
console.log(result);

https://jsfiddle.net/dwr3qgs0/1/

What can be the best approach for this task?

CertainPerformance · Accepted Answer

Your code currently doesn't match anything for 2 reasons:

You're using word boundaries with \b, which means that only a match for the standalone word will work. In the code in your question, egseg is not a standalone word anywhere. In the code in the fiddle, eg is a standalone word, but it exists at the very end of the string
You're requiring at least one character before and after the matched word with your {1,' + truncateLength + '}. This is why, in the fiddle, the eg isn't matched.

If you want to match the searchExpression anywhere, remove the word boundaries, and use {0,, not {1,, in case the match is at the beginning or end of the string:

const truncateHighlightedText = (
  sentence,
  searchExpression,
  truncateLength
) => {
  const pattern = new RegExp(
    '\b.{0,' +
    truncateLength +
    '}' +
    searchExpression +
    '.{0,' +
    truncateLength +
    '}\b',
    'i'
  );
  console.log(pattern)

  return sentence.match(pattern);
};


const sentence = "Testsdfgbsegsegsrewgserwfgvsrevfse  ewrwer wergwregew    erwgrewgwerg   erwgwr eerg rg g er egr ew  erger  rtggrt tr ert tr tr tg tgr gtr  gtr egrt rtg trg rg e eg";
const searchExpression = "egseg";
const truncateLength = 30;


const result = truncateHighlightedText(sentence, searchExpression, truncateLength);
console.log(result);

To add ...s to the ends which contain additional unshown characters, optionally capture a character before and after the match in lookaround tokens and add ...s if they've captured anything:

const truncateHighlightedText = (
  sentence,
  searchExpression,
  truncateLength
) => {
  const pattern = new RegExp(
    '(?<=(.)?)\b.{0,' +
    truncateLength +
    '}' +
    searchExpression +
    '.{0,' +
    truncateLength +
    '}\b(?=(.)?)',
    'i'
  );
  const match = sentence.match(pattern);
  return (match[1] ? '...' : '') + match[0] + (match[2] ? '...' : '');
};


const sentence = "Testsdfgbsegsegsrewgserwfgvsrevfse  ewrwer wergwregew    erwgrewgwerg   erwgwr eerg rg g er egr ew  erger  rtggrt tr ert tr tr tg tgr gtr  gtr egrt rtg trg rg e eg";
const searchExpression = "egseg";
const truncateLength = 30;


const result = truncateHighlightedText(sentence, searchExpression, truncateLength);
console.log(result);

Without lookbehind, use capturing groups everywhere instead of the match[0]:

const truncateHighlightedText = (
  sentence,
  searchExpression,
  truncateLength
) => {
  const pattern = new RegExp(
    '(.)?(\b.{0,' +
    truncateLength +
    '}' +
    searchExpression +
    '.{0,' +
    truncateLength +
    '})\b(.)?',
    'i'
  );
  const match = sentence.match(pattern);
  return (match[1] ? '...' : '') + match[2] + (match[3] ? '...' : '');
};


const sentence = "Testsdfgbsegsegsrewgserwfgvsrevfse  ewrwer wergwregew    erwgrewgwerg   erwgwr eerg rg g er egr ew  erger  rtggrt tr ert tr tr tg tgr gtr  gtr egrt rtg trg rg e eg";
const searchExpression = "egseg";
const truncateLength = 30;


const result = truncateHighlightedText(sentence, searchExpression, truncateLength);
console.log(result);

JS: shorten search results by the words, which were found

Answers (1)

Related Questions