Rembrandt Reyes
Rembrandt Reyes

Reputation: 73

Is there a better way to check for similarities within an array?

I am getting a response which returns and an array of hashes. The array of hashes has two keys "title", and "paragraph". Sometimes I get responses that return similar values within the paragraph key.

For example when I just return the values in the paragraph:

["Welcome to the best place", "Welcome to the best place in the world, Boston!"]

You see that at index 0 it includes what is at index 1

I am mapping through the array of hashes to return one of the keys, "paragraph". I then try to filter out the first element if the value is equal to any of the other elements in the array. I have something that only works when the array has similar values as state above and will return an empty array if it fails.

const description = hotel
    .description()
    .map(descriptions => descriptions.paragraph)
    .filter((paragraph, index) => !paragraph[index].includes(paragraph[0]))

Where hotel.description() returns the array of hashes and the map chain to filter will return the results in an array

The example code above returns a valid response where array:

["Welcome to the best place", "Welcome to the best place in the world, Boston!"]

Becomes:

["Welcome to the best place in the world, Boston!"]

But if the array return is unique an empty array is returned.

The expected results are:

["You are here at the best place", "Welcome to the best place in the world, Boston!"]

The actual results are: []

Not sure what else to append to this chain to get it to return the unique values.

Upvotes: 0

Views: 116

Answers (3)

JsCoder
JsCoder

Reputation: 1733

Here's how you do it in 'one go', using reduce array comprehension:

const result =
        [{ paragraph: "D" }, { paragraph: "A" }, { paragraph: "ABC" }, { paragraph: "AB" }, { paragraph: "A" }, { paragraph: "DEFG" }, { paragraph: "DE" }]
            .map(({ paragraph }) => paragraph)
            .sort()
            .reverse()
            .reduce((existingParagraphs, currentParagraph) => {

                if (existingParagraphs.length == 0
                    || !existingParagraphs.some(existingParagraph => existingParagraph.startsWith(currentParagraph))) {
                    existingParagraphs.push(currentParagraph);
                }
                return existingParagraphs;
            }, []);

Upvotes: 0

Scott Sauyet
Scott Sauyet

Reputation: 50797

This is one possibility. I separate out the detection of similarity and the choosing the better of two similar items from the logic of keeping the similar ones. The function includes simply reports whether one of two strings is a substring of the other, and longer chooses the longer of two strings.

Obviously those helper functions can be embedded back into the main function, but I think this is more logical.

const keepSimilar = (similarTo, better) => (xs) => 
  xs.reduce((found, x) => {
    const index = found.findIndex(similarTo(x))
    if (index > -1) {
      found[index] = better(x, found[index])
    } else {
      found.push(x)
    }
    return found
  }, [], xs)

const includes = (s1) => (s2) => s1.includes(s2) || s2.includes(s1)
const longer = (s1, s2) => s2.length > s1.length ? s2 : s1 

const similarParas = keepSimilar(includes, longer)

const paras = ['foo', 'bar', 'baz', 'foobar', 'bazqux']

console.log(similarParas(paras)) //=> ['foobar', 'baz', 'barqux']
console.log(similarParas(['ABC', 'AB', 'DEF', 'DEFG'])) //=> ['ABC','DEFG']
console.log(similarParas([
  'Welcome to the best place', 
  'Welcome to the best place in the world, Boston!'
]))
//=> ['Welcome to the best place in the world, Boston!']

console.log(similarParas([
  'You are here at the best place', 
  'Welcome to the best place in the world, Boston!'
]))
//=> ['You are here at the best place', 'Welcome to the best place in the world, Boston!']

This is not very pretty code. I'm one of the principles of Ramda, and I would do it very differently with a library like that, especially avoiding mutation of the accumulator object. But this should work.

Upvotes: 1

pwilcox
pwilcox

Reputation: 5753

I'm simplifying your example to work with it, but the concept still applies here. I'm also making the following assumptions:

  • "Similar" means "includes"
  • You would be interested in all similarities, not just similarity with the first
  • Your original data has no strict duplicate phrases (this can be worked around though)
  • You prefer to remove the subset phrases and keep the superset phrases (if this makes sense).

If so, then the following approach seems to work for your needs:

let greetings = [
  "Welcome to the best place", 
  "Welcome to the best place in the world, Boston!"
];

let condensed = 
  greetings
  .filter(g => 
    !greetings.some(other => other.includes(g) && !(other == g))
  );

console.log(condensed);

And here it is not returning an empty array when all values are non-similar:

let greetings = [
  "You're at the best place", 
  "Welcome to the best place in the world, Boston!"
];

let condensed = 
  greetings
  .filter(g => 
    !greetings.some(other => other.includes(g) && !(other == g))
  );

console.log(condensed);

Upvotes: 2

Related Questions