Marwan
Marwan

Reputation: 81

C# compare two strings for matching words

I have two strings containing letters and numbers separated by spaces. ex "elza7ma wa2fa fel matab" and "2ana ba7eb el za7ma 2awy 2awy"

What is the fastest way to compare this two string to find out whether or not they have a word in common?

I tried to split one of them using string.split and use string.compare on the whole array of words. but this is very slow since I will be comparing a lot of strings.

Upvotes: 7

Views: 9682

Answers (5)

Russ Cam
Russ Cam

Reputation: 125488

A LINQ solution

"elza7ma wa2fa fel matab".Split()
                         .Intersect("2ana ba7eb el za7ma 2awy 2awy".Split())
                         .Any();

// as a string extension method
public static class StringExtensions
{
    public static bool OneWordMatches(this string theString, string otherString)
    {
        return theString.Split().Intersect(otherString.Split()).Any();
    }
}

// returns true
"elza7ma wa2fa fel matab 2ana".OneWordMatches("2ana ba7eb el za7ma 2awy 2awy");

Upvotes: 15

Sjoerd
Sjoerd

Reputation: 75588

  • The easiest way would be to compare all words to any other word. This is an easy solution, but slow.
  • Another way is to sort both lists, and then compare the top two entries. Like mergesort, but with the goal of finding equal words.
  • Another way is to compile the list of words into a tree, and match the words against that tree. A regex can do this, or you can do this yourself. In your example, the first letter should be 2, b, e or z. This way, each word is only inspected once and the least number of characters are inspected.

Upvotes: 0

Matt
Matt

Reputation: 26971

I'd probably take the initial performance hit and split the string and then order alphabetically and by word length. If you just have to find out if one word matches, break as soon as you find one. Once you have the split string arrays ordered alphabetically and by length, that limits the numbers of compares you would have to do.

Upvotes: 0

Michael Banzon
Michael Banzon

Reputation: 4967

You could split the two strings by word and build two hashtables/dictionaries. Then go through both and add keys incrementing an int in a third dictionary (Dictionary<string, int>). If any key in the third dictionary have a count of more than one, that word is in both original strings.

I would think that any algorithm to solve this problem would be 'slow' - especially for large input strings/many words.

Upvotes: 1

JaredPar
JaredPar

Reputation: 754575

I think the easiest way is to break up the strings into words and use a set structure like HashSet<string> to check for duplicates. For example

public bool HasMatchingWord(string left, string right) { 
  var hashSet = new HashSet<string>(
    left.Split(" ", StringSplitOptions.RemoveEmptyEntries)); 
  return right
    .Split(" ", StringSplitOptions.RemoveEmptyEntries)
    .Any(x => hashSet.Contains(x));
}

Upvotes: 5

Related Questions