Pm Rivière
Pm Rivière

Reputation: 291

Match sub-string within a string with tolerance of 1 character mismatch (in JS)

This question was successfully answered for C language but not for JS as of yet.

Write a function that can find if a string is a sub-string of another. Note that a mismatch of one character should be ignored.

A mismatch can be an extra character: ’dog’ matches ‘xxxdoogyyyy’  
A mismatch can be a missing character: ’dog’ matches ‘xxxdgyyyy’ 
A mismatch can be a different character: ’dog’ matches ‘xxxdigyyyy’

Existing fuzzy search modules are much too complex and unpredictable for this specific case. How to write such function in JS?

Note: Here is the original question and accepted solution by @IVlad (2016) in C language :

int findHelper(const char *str, const char *substr, int mustMatch = 0)
{
    if ( *substr == '\0' )
        return 1;

    if ( *str == '\0' )
        return 0;

    if ( *str == *substr )
        return findHelper(str + 1, substr + 1, mustMatch);
    else
    {
        if ( mustMatch )
            return 0;

        if ( *(str + 1) == *substr )
            return findHelper(str + 1, substr, 1);
        else if ( *str == *(substr + 1) )
            return findHelper(str, substr + 1, 1);
        else if ( *(str + 1) == *(substr + 1) )
            return findHelper(str + 1, substr + 1, 1);
        else if ( *(substr + 1) == '\0' )
            return 1;
        else
            return 0;
    }
}

int find(const char *str, const char *substr)
{
    int ok = 0;
    while ( *str != '\0' )
        ok |= findHelper(str++, substr, 0);

    return ok;
}


int main()
{
    printf("%d\n", find("xxxdoogyyyy", "dog"));
    printf("%d\n", find("xxxdgyyyy", "dog"));
    printf("%d\n", find("xxxdigyyyy", "dog"));
}

Upvotes: 3

Views: 788

Answers (2)

Soldeplata Saketos
Soldeplata Saketos

Reputation: 3461

I created my own version, that also removes special characters from the input, just in case we don't want extra matches when writing '...' that would match everything.

export const escapedRegexCharacters = /[\\.+=?!:[\](){}\-$*&|/]/g;
export const find = (word, source, options = { typos: 1, minChars: 3 }) => {
  const expressions = [];
  const { typos = 1, minChars = 3 } = options;
  const trimmed = String(word).trim();
  const replaced = trimmed.replace(escapedRegexCharacters, '');
  const chopped = replaced.split('');
  if (replaced.length < typos + 1 || replaced.length < minChars) return true;
  if (!typos) return source.match(new RegExp(replaced, 'gi'));
  for (let index = 0; index < chopped.length; index += 1) {
    const parts = [...chopped];
    parts.splice(index, 1, `.{${typos}}`);
    expressions.push(parts.join(''));
  }
  const finalExpression = expressions.join('|');

  return source.match(new RegExp(finalExpression, 'gi'));
};

Upvotes: 1

scott6
scott6

Reputation: 766

you can do this using regex generator like this

Edit : add another test cases

Edit #2: as @amadan suggest, add ?.? for char insertion scenario

var txtToSearch = 'dog';
function find(txt,src) {
  // generate regex for all possibilities. for this case, it will generate "d?.?og|do?.?g|dog?.?" -> double .? are for 1 char insertion 
  var re = new RegExp(txt.split('').map(function(a,b,c){ return txt.substr(0, b)+a+'?.?'+ txt.substr(b+1);}).join('|'),'gi');
  return src.match(re)!=null
}

// test cases
console.log('"'+txtToSearch+'" in : xxxdoogyyyy -> '+find(txtToSearch,'xxxdoogyyyy'));
console.log('"'+txtToSearch+'" in : xxxdgyyyy -> '+find(txtToSearch,'xxxdgyyyy'));
console.log('"'+txtToSearch+'" in : xxxdigyyyy -> '+find(txtToSearch,'xxxdigyyyy'));
console.log('"'+txtToSearch+'" in : xxxggoodyyyy -> '+find(txtToSearch,'xxxggoodyyyy'));

// another test case
console.log('"great" in : xxzzgreetsxxy -> '+find('great','xxzzgreetsxxy'));
console.log('"greetings" in : xxzzgreextingssxxy-> '+find('greetings','xxzzgreextingssxxy'));

Upvotes: 2

Related Questions