FSm
FSm

Reputation: 2057

ienumerable <string> to dictionary <string, int>

I'm using the following code to split array of strings into list.

private List<string> GenerateTerms(string[] docs)
    {
        return docs.SelectMany(doc => ProcessDocument(doc)).Distinct().ToList();
    }

    private IEnumerable<string> ProcessDocument(string doc)
    {
        return doc.Split(' ')
                  .GroupBy(word => word)
                  .OrderByDescending(g => g.Count())
                  .Select(g => g.Key)
                  .Take(1000);
    }

What I want to do is replace the list returned with

Dictionary <string, int>

i.e. instead of returned list , i want to return Dictionary

Could anyone help ?? thanks in advance.

Upvotes: 2

Views: 2869

Answers (4)

Hamlet Hakobyan
Hamlet Hakobyan

Reputation: 33391

Try this:

string[] docs = {"aaa bbb", "aaa ccc", "sss, ccc"};        

var result = docs.SelectMany(doc => doc.Split())
                 .GroupBy(word => word)
                 .OrderByDescending(g => g.Count())
                 .ToDictionary(g => g.Key, g => g.Count())
                 .Take(1000);

EDIT:

var result = docs.SelectMany(
        doc => doc.Split()
            .GroupBy(word => word)
            .OrderByDescending(g => g.Count())
            .Take(1000))
    .Select(g => new {Word = g.Key, Cnt = g.Count()})
    .GroupBy(t => t.Word)
    .ToDictionary(g => g.Key, g => g.Sum(t => t.Cnt));

Upvotes: 1

Habib
Habib

Reputation: 223412

string doc = "This is a test sentence with some words with some words repeating like: is a test";
var result = doc.Split(' ')
                   .GroupBy(word => word)
                   .OrderByDescending(g=> g.Count())
                   .Take(1000)
                   .ToDictionary(r => r.Key ,r=> r.Count());

EDIT:

I believe you are looking to get a final dictionary from array of strings, based on words as key and their final count as values. Since dictionary can't contain duplicate values, so you will not be required to use Distict. You have to re-write your methods as:

private Dictionary<string,int> GenerateTerms(string[] docs)
{
    List<Dictionary<string, int>> combinedDictionaryList = new List<Dictionary<string, int>>();
    foreach (string str in docs)
    {
        //Add returned dictionaries to a list
        combinedDictionaryList.Add(ProcessDocument(str));
    }
    //return a single dictionary from list od dictionaries
    return combinedDictionaryList
            .SelectMany(dict=> dict)
            .ToLookup(pair => pair.Key, pair => pair.Value)
            .ToDictionary(group => group.Key, group => group.Sum(value => value));
}

private Dictionary<string,int> ProcessDocument(string doc)
{
    return doc.Split(' ')
            .GroupBy(word => word)
            .OrderByDescending(g => g.Count())
            .Take(1000)
            .ToDictionary(r => r.Key, r => r.Count());
}

Then you can call it like:

string[] docs = new[] 
    {
        "This is a test sentence with some words with some words repeating like: is a test",
        "This is a test sentence with some words with some words repeating like: is a test",
        "This is a test sentence with some words",
        "This is a test sentence with some words",
    };

Dictionary<string, int> finalDictionary = GenerateTerms(docs);

Upvotes: 2

Sando
Sando

Reputation: 667

Try something like this:

    var keys = new List<string>();
    var values = new List<string>();
    var dictionary = keys.ToDictionary(x => x, x => values[keys.IndexOf(x)]);

Upvotes: 0

Quintin Robinson
Quintin Robinson

Reputation: 82375

Without any additional cruft the following should work.

return doc.Split(' ')
          .GroupBy(word => word)
          .ToDictionary(g => g.Key, g => g.Count());

Tailor it via Take, OrderBy etc as is necessary for your situation.

Upvotes: 0

Related Questions