user1314404
user1314404

Reputation: 1295

Form a list of distinct words from set of repeating words lists in c#

I have a model:

 public class CompanyModel1
    {
        public string compnName1 { get; set; }
        public string compnKeyProcesses1 { get; set; }
    }

then I form a list:

List<CompanyModel1> companies1 = new List<CompanyModel1>();

If I access its values:

var newpairs = companies1.Select(x => new { Name = x.compnName1, Processes = x.compnKeyProcesses1 });
            foreach (var item in newpairs)
            {

                string CName = item.Name;
                Process = item.Processes;
            }

I will get value like:

CName = "name1"
Process = "Casting, Casting, Casting, Welding, brazing & soldering"

and

CName = "name2"
Process = "Casting, Welding, Casting, Forming & Forging, Moulding"

etc.

Now I want to form a list of distinct Process and count number of them, how many time each of them have by different name.

For example with these two above, I have to form a list like following:

"Casting, Welding, brazing & soldering, Forming & Forging, Moulding"

and if I count there will be: 5 distinct Processes; frequency of them by each name:

"Casting" appears in 2 names
"Welding" appears in 2 names
"brazing & soldering" appears in 1 names
"Forming & Forging" appears in 1 names
"Moulding" appears in 1 names

I am thinking of Linq can help with this problem, may be something like this:

var list= Process
    .SelectMany(u => u.Split(new string[] { ", " }, StringSplitOptions.None))
    .GroupBy(s => s)
    .ToDictionary(g => g.Key, g => g.Count());

var numberOfProcess = list.Count;

var numberOfNameWithProcessOne = frequency["Process1"];

But how could I put that in the foreach loop and apply for all the names and processes that I have and get the result I want?

Upvotes: 2

Views: 426

Answers (1)

Ideae
Ideae

Reputation: 622

var processes = companies1.SelectMany(
c => c.compnKeyProcesses1.Split(new char[] { ',' }).Select(s => s.Trim()).Distinct())
.GroupBy(s => s).ToDictionary(g => g.Key, g => g.Count());
foreach(var process in processes)
{
    Console.WriteLine("\"{0}\" appears in {1} names", process.Key, process.Value);
}

This selects only distinct processes from each individual company, and then creates all master list using SelectMany to store the correct number of unique occurrences for every process. Then we just count the occurrences of each process in the final list, and put them into a dictionary of process=>count.

EDIT:

Here is another solution that groups the data in a dictionary, to allow showing the associated companies with each process. The dictionary is from Process Names -> List of Company Names.

Func<string, IEnumerable<string>> stringToListConverter = s => s.Split(new char[] { ','     }).Select(ss => ss.Trim());
var companiesDict = companies1.ToDictionary(c => c.compnName1, c => stringToListConverter(c.compnKeyProcesses1).Distinct());
var processesAll = companies1.SelectMany(c => stringToListConverter(c.compnKeyProcesses1)).Distinct();
var processesToNames = processesAll.ToDictionary(s => s, s => companiesDict.Where(d => d.Value.Contains(s)).Select(d => d.Key).ToList());
foreach(var processToName in processesToNames)
{
     List<string> companyNames = processToName.Value;
     Console.WriteLine("\"{0}\" appears in {1} names : {2}", processToName.Key, companyNames.Count, String.Join(", ", companyNames));
}

I've saved the stringToListConverter Func delegate to convert the process string into a list, and used that delegate in two of the queries.

This query would be more readable if the CompanyModel1 class stored the compnKeyProcesses1 field as a List<string> instead of just one big string. That way you could instantly query the list instead of having the split, select, and trim every time.

Upvotes: 3

Related Questions