Reputation: 2027
I have been trying to find most frequent words from a list of strings. I have tried something like Find the most occurring number in a List<int>
but issue is that it returns only one word, but all those words are required which are most frequent.
For example, if we call that LINQ query on following list:
Dubai
Karachi
Lahore
Madrid
Dubai
Sydney
Sharjah
Lahore
Cairo
it should result us in:
ans: Dubai, Lahore
Upvotes: 3
Views: 3291
Reputation: 1857
I'm sure there must be better way, but one thing I manage to make (which may help you to make it more optimised) is something like follow
List<string> list = new List<string>();
list.Add("Dubai");
list.Add("Sarjah");
list.Add("Dubai");
list.Add("Lahor");
list.Add("Dubai");
list.Add("Sarjah");
list.Add("Sarjah");
int most = list.GroupBy(i => i).OrderByDescending(grp => grp.Count())
.Select(grp => grp.Count()).First();
IEnumerable<string> mostVal = list.GroupBy(i => i).OrderByDescending(grp => grp.Count())
.Where(grp => grp.Count() >= most)
.Select(grp => grp.Key) ;
this will list of those who are occurring most frequent, if two entries are occurring frequency is same, they both will be included.
NOTE we are not selecting entries having frequency more than once.
Upvotes: 1
Reputation: 186668
In case you want Dubai, Lahore
only (i.e. only words with top occurrence, which is 2 in the sample):
List<String> list = new List<String>() {
"Dubai", "Karachi", "Lahore", "Madrid", "Dubai", "Sydney", "Sharjah", "Lahore", "Cairo"
};
int count = -1;
var result = list
.GroupBy(s => s, s => 1)
.Select(chunk => new {
name = chunk.Key,
count = chunk.Count()
})
.OrderByDescending(item => item.count)
.ThenBy(item => item.name)
.Where(item => {
if (count < 0) {
count = item.count; // side effects, alas (we don't know count a-priory)
return true;
}
else
return item.count == count;
})
.Select(item => item.name);
Test:
// ans: Dubai, Lahore
Console.Write("ans: " + String.Join(", ", result));
Upvotes: 1
Reputation: 1461
If you want to get several most frequent words, you can use this method:
public List<string> GetMostFrequentWords(List<string> list)
{
var groups = list.GroupBy(x => x).Select(x => new { word = x.Key, Count = x.Count() }).OrderByDescending(x => x.Count);
if (!groups.Any()) return new List<string>();
var maxCount = groups.First().Count;
return groups.Where(x => x.Count == maxCount).Select(x => x.word).OrderBy(x => x).ToList();
}
[TestMethod]
public void Test()
{
var list = @"Dubai,Karachi,Lahore,Madrid,Dubai,Sydney,Sharjah,Lahore,Cairo".Split(',').ToList();
var result = GetMostFrequentWords(list);
Assert.AreEqual(2, result.Count);
Assert.AreEqual("Dubai", result[0]);
Assert.AreEqual("Lahore", result[1]);
}
Upvotes: 1
Reputation: 39326
Use a group by and then order by count:
var result = list
.GroupBy(s => s)
.Where(g=>g.Count()>1)
.OrderByDescending(g => g.Count())
.Select(g => g.Key);
Upvotes: 5
Reputation: 6662
If you need all words which are occurring repeatedly ..
List<string> list = new List<string>();
list.Add("A");
list.Add("A");
list.Add("B");
var most = (from i in list
group i by i into grp
orderby grp.Count() descending
select new { grp.Key, Cnt = grp.Count() }).Where (r=>r.Cnt>1);
Upvotes: 2