Reputation: 2057
I'm using the following code to split array of strings into list.
private List<string> GenerateTerms(string[] docs)
{
return docs.SelectMany(doc => ProcessDocument(doc)).Distinct().ToList();
}
private IEnumerable<string> ProcessDocument(string doc)
{
return doc.Split(' ')
.GroupBy(word => word)
.OrderByDescending(g => g.Count())
.Select(g => g.Key)
.Take(1000);
}
What I want to do is replace the list returned with
Dictionary <string, int>
i.e. instead of returned list , i want to return Dictionary
Could anyone help ?? thanks in advance.
Upvotes: 2
Views: 2869
Reputation: 33391
Try this:
string[] docs = {"aaa bbb", "aaa ccc", "sss, ccc"};
var result = docs.SelectMany(doc => doc.Split())
.GroupBy(word => word)
.OrderByDescending(g => g.Count())
.ToDictionary(g => g.Key, g => g.Count())
.Take(1000);
EDIT:
var result = docs.SelectMany(
doc => doc.Split()
.GroupBy(word => word)
.OrderByDescending(g => g.Count())
.Take(1000))
.Select(g => new {Word = g.Key, Cnt = g.Count()})
.GroupBy(t => t.Word)
.ToDictionary(g => g.Key, g => g.Sum(t => t.Cnt));
Upvotes: 1
Reputation: 223412
string doc = "This is a test sentence with some words with some words repeating like: is a test";
var result = doc.Split(' ')
.GroupBy(word => word)
.OrderByDescending(g=> g.Count())
.Take(1000)
.ToDictionary(r => r.Key ,r=> r.Count());
EDIT:
I believe you are looking to get a final dictionary from array of strings, based on words as key and their final count as values. Since dictionary can't contain duplicate values, so you will not be required to use Distict
.
You have to re-write your methods as:
private Dictionary<string,int> GenerateTerms(string[] docs)
{
List<Dictionary<string, int>> combinedDictionaryList = new List<Dictionary<string, int>>();
foreach (string str in docs)
{
//Add returned dictionaries to a list
combinedDictionaryList.Add(ProcessDocument(str));
}
//return a single dictionary from list od dictionaries
return combinedDictionaryList
.SelectMany(dict=> dict)
.ToLookup(pair => pair.Key, pair => pair.Value)
.ToDictionary(group => group.Key, group => group.Sum(value => value));
}
private Dictionary<string,int> ProcessDocument(string doc)
{
return doc.Split(' ')
.GroupBy(word => word)
.OrderByDescending(g => g.Count())
.Take(1000)
.ToDictionary(r => r.Key, r => r.Count());
}
Then you can call it like:
string[] docs = new[]
{
"This is a test sentence with some words with some words repeating like: is a test",
"This is a test sentence with some words with some words repeating like: is a test",
"This is a test sentence with some words",
"This is a test sentence with some words",
};
Dictionary<string, int> finalDictionary = GenerateTerms(docs);
Upvotes: 2
Reputation: 667
Try something like this:
var keys = new List<string>();
var values = new List<string>();
var dictionary = keys.ToDictionary(x => x, x => values[keys.IndexOf(x)]);
Upvotes: 0
Reputation: 82375
Without any additional cruft the following should work.
return doc.Split(' ')
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
Tailor it via Take
, OrderBy
etc as is necessary for your situation.
Upvotes: 0