city
city

Reputation: 2154

how to find two sentences which have the largest number of common words?

Given a list of sentences, find two sentences which have the largest number of common words. The common words does not need locate in the same position in the sentences( order does not matter).

Thanks !

update:

Does non-pairwise algorithm for this problem exist? Because pairwise is very straightforward.

my idea is to use inverted index to store where this word appears. This need traverse every word in each sentence. And then create a n*n 2D array which is used to count how many times two sentences appear in same bucket in inverted index.

Upvotes: 1

Views: 2500

Answers (2)

Assume you have an array of sentences:

String[] sentences

Create some variables which contain default values to keep track of the two sentences with the most common words

sentence1Index = -1
sentence2Index = -1
maxCount = -1

Do a nested loop on sentences array

for i : 0 -> sentences.length
    for j : 0 -> sentences.length

Make sure you aren't checking the same sentence

  if i != j

Split the Strings by empty space (which will usually give you each word assuming you count some symbols as words)

  String[] words1 = sentences[i].splitAt(" ")
  String[] words2 = sentences[j].splitAt(" ")

Create a temporary count value for this run

  tempCount = 0

Loop between two word arrays (gotten from the two sentences you are comparing)

  for a : 0 -> words1 .length
      for b : 0 -> words2.length

If the word is the same, then increment temp count

          if words[a] equal-to-ignore-case words[b]
              tempCount++

After finishing comparing words, if the tempCount is greater than current maxCount, update all values that keep track of you are looking for

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

Return newly created array which the two sentences

if sentence1Index != -1 and sentence2Index  != -1
    String[] retArray =   sentences[sentence1Index], sentences[sentence2Index ]
    return retArray

return null

All pseudo code:

String[] sentences
sentence1Index = -1
sentence2Index = -1
maxCount = -1

for i : 0 -> sentences.length
    for j : 0 -> sentences.length
      if i != j
          String[] words1 = sentences[i].splitAt(" ")
          String[] words2 = sentences[j].splitAt(" ")
          tempCount = 0
          for a : 0 -> words1 .length
              for b : 0 -> words2.length
                  if words[a] equal-to-ignore-case words[b]
                      tempCount++
          if tempCount > maxCount
              sentence1Index = i
              sentence2Index = j
              maxCount = tempCount

if sentence1Index != -1 and sentence2Index  != -1
    String[] retArray =   sentences[sentence1Index], sentences[sentence2Index ]
    return retArray

return null

Upvotes: 1

Gavin
Gavin

Reputation: 8200

First you need a method that will take two of the sentences and determine how many words they have in common. This could work by taking the two sentences given as input, and creating from it two arrays containing the words in alphabetical order. Then you can examine the two arrays, advancing forward whichever array comes alphabetically earlier (so if the current match is "abacus" and "book", you would move "abacus" to the next word). If you have a match ("book" and "book"), then you increment a count of matched words, and move both arrays to the next word. You continue to do this until you reach the end of one of the arrays (since the remainder of words in the other array won't have any matches).

Once you have this algorithm implemented, you will need a loop that will look as follows:

for (i = 0; i < sentenceCount - 1; i++) {
    for (j = i+1; j < sentenceCount; j++) {
    }
}

Inside the loop you will call your function that calculates the number of words in common using the sentences at indexes i and j. You will keep track of the most number of words in common seen up to that point, and the two sentences where it was found. If a new sentence has a higher number of words in common, you will store that count and the two sentences that yielded that count. At the end, you'll have the two sentences that you want.

Upvotes: 0

Related Questions