shuvo sarker
shuvo sarker

Reputation: 881

Count Unique words in a string in C#

Suppose a string like this.

String: I have a car.I had bought it two years ago.I like it very much.

i need to find out only unique words from it.like unique words in this string are have,a,car,had,bought,it,two,years etc. the words are appear in the string just once. i have tried it with linq. please take look.

string testingtext="I have a car.I had bought it two years ago.I like it very    much.";
MatchCollection Wordcollection = Regex.Matches(testingtext, @"[\S]+");

          string[] array =     Wordcollection.Cast<Match>().Select(x => x.Value).Distinct().OrderBy(y => y).ToArray();

Upvotes: 1

Views: 7059

Answers (6)

Martin Kramer
Martin Kramer

Reputation: 1

The easiest way is to use a HashSet of string

    string testingtext = "I have a car.I had bought it two years ago.I like it very    much.";
    string[] words = testingtext.Split(' ');
    HashSet<string> uniqueWords = new HashSet<string>();
    foreach (string word in words)
    {
        uniqueWords.Add(word);
    }

Upvotes: 0

user1297556
user1297556

Reputation:

    public int GetUniqueWordsCount(string txt)
    {
        // Use regular expressions to replace characters
        // that are not letters or numbers with spaces.
        txt = new Regex("[^a-zA-Z0-9]").Replace(txt, " ");

        // Split the text into words.
        var words = txt.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);

        // Use LINQ to get the unique words.
        var wordQuery = words.Distinct();

        return wordQuery.Count();

        //If you want words
        //return word_query.ToArray();
    }

Upvotes: 0

Dmitrii Bychenko
Dmitrii Bychenko

Reputation: 186823

There's no need for Linq, HashSet<String> is quite enough

  String source = "I have a car I had bought it two years ago I like it very much.";
  //TODO: check this    
  Char[] separators = new Char[] {' ', '\r', '\n', '\t', ',', '.', ';', '!', '?'};

  HashSet<String> uniqueWords = 
    new HashSet<String>(source.Split(separators, StringSplitOptions.RemoveEmptyEntries)), 
    StringComparer.OrdinalIgnoreCase);

  // 13
  Console.Write(uniqueWords.Count);
  ...
  // I, have, a, car, had, bought, it, two, years, ago, like, very, much
  ConsoleWrite(String.Join(", ", uniqueWords));

Please note, that such kind of solutions work in simple cases only; word in natural languages is a vague notion so in general case of NLP (Batural Language Processing) you have to use a special designed libraries.

Upvotes: 0

Pierre-Luc Pineault
Pierre-Luc Pineault

Reputation: 9201

Distinct cannot be used for this task. Distinct will simply remove all duplicates of a word; you'll get every words anyway whether they were unique or not.

Instead, you need to use GroupBy. It will make a new Key-Value list, with the words and each occurrences.

Once you have that, simply take each key for which the group contains only one value (i.e. the word appears only once in the string):

    string testingtext = "I have a car I had bought it two years ago I like it very much.";

    IEnumerable<string> allWords = testingtext.Split(' ');
    IEnumerable<string> uniqueWords = allWords.GroupBy(w => w).Where(g => g.Count() == 1).Select(g => g.Key);

You might also want to clean your input text beforehand to remove the punctuation, if you want to treat car and car. as the same word.

Upvotes: 2

Leon Barkan
Leon Barkan

Reputation: 2703

string[] wordsArray = testingtext.Replace("."," ").Split(' ');
int carCounter = 0;
int haveCounter = 0;
//...

foreach(String word in wordsArray )
{
if(word.Equals("car"))
  carCounter++;
if(word.Equals("have"))
  haveCounter++;
//...
}

after that you know how many words you have... simple

Upvotes: 0

Mohit S
Mohit S

Reputation: 14064

This might solve your issue

string MyStr = "I have a car.I had bought it two years ago.I like it very much";
var wrodList = MyStr.Split(null);
var output = wrodList.GroupBy(x => x).Select(y => new Word { charchter = y.Key, repeat = y.Count() }).OrderBy(z=>z.repeat);
foreach (var item in output)
{
    textBoxfile.Text += item.charchter +" : "+ item.repeat+Environment.NewLine;
}

You also need to create a class (word)

public class word
{
    public string  charchter { get; set; }
    public int repeat { get; set; }
}

Upvotes: 0

Related Questions