Reputation: 881
Suppose a string like this.
String: I have a car.I had bought it two years ago.I like it very much.
i need to find out only unique words from it.like unique words in this string are have,a,car,had,bought,it,two,years etc. the words are appear in the string just once. i have tried it with linq. please take look.
string testingtext="I have a car.I had bought it two years ago.I like it very much.";
MatchCollection Wordcollection = Regex.Matches(testingtext, @"[\S]+");
string[] array = Wordcollection.Cast<Match>().Select(x => x.Value).Distinct().OrderBy(y => y).ToArray();
Upvotes: 1
Views: 7059
Reputation: 1
The easiest way is to use a HashSet of string
string testingtext = "I have a car.I had bought it two years ago.I like it very much.";
string[] words = testingtext.Split(' ');
HashSet<string> uniqueWords = new HashSet<string>();
foreach (string word in words)
{
uniqueWords.Add(word);
}
Upvotes: 0
Reputation:
public int GetUniqueWordsCount(string txt)
{
// Use regular expressions to replace characters
// that are not letters or numbers with spaces.
txt = new Regex("[^a-zA-Z0-9]").Replace(txt, " ");
// Split the text into words.
var words = txt.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
// Use LINQ to get the unique words.
var wordQuery = words.Distinct();
return wordQuery.Count();
//If you want words
//return word_query.ToArray();
}
Upvotes: 0
Reputation: 186823
There's no need for Linq, HashSet<String>
is quite enough
String source = "I have a car I had bought it two years ago I like it very much.";
//TODO: check this
Char[] separators = new Char[] {' ', '\r', '\n', '\t', ',', '.', ';', '!', '?'};
HashSet<String> uniqueWords =
new HashSet<String>(source.Split(separators, StringSplitOptions.RemoveEmptyEntries)),
StringComparer.OrdinalIgnoreCase);
// 13
Console.Write(uniqueWords.Count);
...
// I, have, a, car, had, bought, it, two, years, ago, like, very, much
ConsoleWrite(String.Join(", ", uniqueWords));
Please note, that such kind of solutions work in simple cases only; word in natural languages is a vague notion so in general case of NLP (Batural Language Processing) you have to use a special designed libraries.
Upvotes: 0
Reputation: 9201
Distinct
cannot be used for this task. Distinct
will simply remove all duplicates of a word; you'll get every words anyway whether they were unique or not.
Instead, you need to use GroupBy
. It will make a new Key-Value list, with the words and each occurrences.
Once you have that, simply take each key for which the group contains only one value (i.e. the word appears only once in the string):
string testingtext = "I have a car I had bought it two years ago I like it very much.";
IEnumerable<string> allWords = testingtext.Split(' ');
IEnumerable<string> uniqueWords = allWords.GroupBy(w => w).Where(g => g.Count() == 1).Select(g => g.Key);
You might also want to clean your input text beforehand to remove the punctuation, if you want to treat car
and car.
as the same word.
Upvotes: 2
Reputation: 2703
string[] wordsArray = testingtext.Replace("."," ").Split(' ');
int carCounter = 0;
int haveCounter = 0;
//...
foreach(String word in wordsArray )
{
if(word.Equals("car"))
carCounter++;
if(word.Equals("have"))
haveCounter++;
//...
}
after that you know how many words you have... simple
Upvotes: 0
Reputation: 14064
This might solve your issue
string MyStr = "I have a car.I had bought it two years ago.I like it very much";
var wrodList = MyStr.Split(null);
var output = wrodList.GroupBy(x => x).Select(y => new Word { charchter = y.Key, repeat = y.Count() }).OrderBy(z=>z.repeat);
foreach (var item in output)
{
textBoxfile.Text += item.charchter +" : "+ item.repeat+Environment.NewLine;
}
You also need to create a class (word)
public class word
{
public string charchter { get; set; }
public int repeat { get; set; }
}
Upvotes: 0