Asynchronous
Asynchronous

Reputation: 3977

Comparing user input to string read from csv file and deciding the output string based on number of similar characters

The code below compares a user input to names of US cities in a csv file. The file is comma-delimited with a single column with a header. If the user input matches a row or string in the file, then the string in the file is returned, if there is no match, then the user input is returned.

In addition to returning data based on the exact match, how can I also return data from the file or the user input based on the number of matched characters?

Example:

User input: Brookly string in file: Brooklyn Output: Brooklyn

In the example above, only one character is different. And so I can say if the total character difference is one, then return string from file, else return user input.

The RemoveAllFormat method in the code simply strip all formatting so that the two strings are compared.

Code:

public string MatchedCity(string input)
{
    string cityMatch = null;
    string[] cityList = null;
    const string lookupFile = @"X:\city.csv";

    using (StreamReader r = new StreamReader(lookupFile))
    {
    string refList = "";
    while ((refList = r.ReadLine()) != null)
    {
        cityList = refList.Split(',');

        foreach (string city in cityList)
        {
            if (String.Equals(RemoveAllFormat(input), RemoveAllFormat(city)))
            {
                cityMatch = city;
                break;
            }
            else
            {
                continue;
            }
        }

        if (string.IsNullOrEmpty(cityMatch) == false)
            break;
        else
            continue;
    }
    }

    if (string.IsNullOrEmpty(cityMatch) == true)
    {
        return input;
    }
    else
    {
        return cityMatch.Replace("\"", "");
    }
}

Upvotes: 2

Views: 818

Answers (1)

merlin2011
merlin2011

Reputation: 75585

You can compute Levenshtein distance using this code someone kindly posted here. It looks like there is another implementation here, under a more obvious license.

You can then decide how much distance you are willing to tolerate for "close enough", and output rows for which the distance is sufficiently small for your taste.

Upvotes: 4

Related Questions