Failed Scientist
Failed Scientist

Reputation: 2027

Find Most Frequent Words using LINQ

I have been trying to find most frequent words from a list of strings. I have tried something like Find the most occurring number in a List<int>

but issue is that it returns only one word, but all those words are required which are most frequent.

For example, if we call that LINQ query on following list:

Dubai
Karachi
Lahore
Madrid
Dubai
Sydney
Sharjah
Lahore
Cairo

it should result us in:

ans: Dubai, Lahore

Upvotes: 3

Views: 3291

Answers (5)

Amit
Amit

Reputation: 1857

I'm sure there must be better way, but one thing I manage to make (which may help you to make it more optimised) is something like follow

List<string> list = new List<string>();
        list.Add("Dubai");
        list.Add("Sarjah");
        list.Add("Dubai");
        list.Add("Lahor");
        list.Add("Dubai");
        list.Add("Sarjah");
        list.Add("Sarjah");


        int most = list.GroupBy(i => i).OrderByDescending(grp => grp.Count())
            .Select(grp => grp.Count()).First();
        IEnumerable<string> mostVal = list.GroupBy(i => i).OrderByDescending(grp => grp.Count())
            .Where(grp => grp.Count() >= most)
            .Select(grp => grp.Key) ;

this will list of those who are occurring most frequent, if two entries are occurring frequency is same, they both will be included.

NOTE we are not selecting entries having frequency more than once.

Upvotes: 1

Dmitrii Bychenko
Dmitrii Bychenko

Reputation: 186668

In case you want Dubai, Lahore only (i.e. only words with top occurrence, which is 2 in the sample):

  List<String> list = new List<String>() {
   "Dubai", "Karachi", "Lahore", "Madrid", "Dubai", "Sydney", "Sharjah", "Lahore", "Cairo"
   };

  int count = -1;

  var result = list
    .GroupBy(s => s, s => 1)
    .Select(chunk => new {
      name = chunk.Key,
      count = chunk.Count()
     })
    .OrderByDescending(item => item.count)
    .ThenBy(item => item.name)
    .Where(item => {
      if (count < 0) {
        count = item.count; // side effects, alas (we don't know count a-priory)

        return true;
      }
      else
        return item.count == count;
    })
    .Select(item => item.name);

Test:

  // ans: Dubai, Lahore
  Console.Write("ans: " + String.Join(", ", result));

Upvotes: 1

Alex Vazhev
Alex Vazhev

Reputation: 1461

If you want to get several most frequent words, you can use this method:

public List<string> GetMostFrequentWords(List<string> list)
{
    var groups = list.GroupBy(x => x).Select(x => new { word = x.Key, Count = x.Count() }).OrderByDescending(x => x.Count);
    if (!groups.Any()) return new List<string>();

    var maxCount = groups.First().Count;

    return groups.Where(x => x.Count == maxCount).Select(x => x.word).OrderBy(x => x).ToList();
}

[TestMethod]
public void Test()
{
    var list = @"Dubai,Karachi,Lahore,Madrid,Dubai,Sydney,Sharjah,Lahore,Cairo".Split(',').ToList();
    var result = GetMostFrequentWords(list);

    Assert.AreEqual(2, result.Count);
    Assert.AreEqual("Dubai", result[0]);
    Assert.AreEqual("Lahore", result[1]);
}

Upvotes: 1

ocuenca
ocuenca

Reputation: 39326

Use a group by and then order by count:

var result = list
  .GroupBy(s => s)
  .Where(g=>g.Count()>1)
  .OrderByDescending(g => g.Count())
  .Select(g => g.Key);

Upvotes: 5

Abdul Rehman Sayed
Abdul Rehman Sayed

Reputation: 6662

If you need all words which are occurring repeatedly ..

  List<string> list = new List<string>();
            list.Add("A");
            list.Add("A");
            list.Add("B");
            var most = (from i in list
                        group i by i into grp
                        orderby grp.Count() descending
                        select new { grp.Key, Cnt = grp.Count() }).Where (r=>r.Cnt>1);

Upvotes: 2

Related Questions