Ciaran Gallagher
Ciaran Gallagher

Reputation: 4020

Identifying and grouping similar items in a collection of strings

I have a collection of strings like the following:

List<string> codes = new List<string>
{
    "44.01", "44.02", "44.03", "44.04", "44.05", "44.06", "44.07", "44.08", "46", "47.10"
};

Each string is made up of two components separated by a full stop - a prefix code and a subcode. Some of the strings don't have sub codes.

I want to be able combine the strings whose prefixes are the same and output them as follows with the other codes also:

44(01,02,03,04,05,06,07,08),46,47.10

I'm stuck at the first hurdle of this, which is how to identify and group together the codes whose prefix values are the same, so that I can combine them into a single string as you can see above.

Upvotes: 1

Views: 1274

Answers (8)

Rahul Singh
Rahul Singh

Reputation: 21795

Try this:-

 var result = codes.Select(x => new { SplitArr = x.Split('.'), OriginalValue = x })
                   .GroupBy(x => x.SplitArr[0])
                   .Select(x => new 
                    {
                       Prefix= x.Key,
                       subCode = x.Count() > 1 ? 
                             String.Join(",", x.Select(z => z.SplitArray[1])) : "",
                       OriginalValue = x.First().OriginalValue
                    });

You can print your desired output like this:-

foreach (var item in result)
{
     Console.Write("{0}({1}),",item.Prefix,item.subCode);
}

Working Fiddle.

Upvotes: 5

Robert McKee
Robert McKee

Reputation: 21487

This will work, including the correct formats for no subcodes, a single subcode, multiple subcodes. It also doesn't assume the prefix or subcodes are numeric, so it leaves leading zeros as is. Your question didn't show what to do in the case you have a prefix without subcode AND the same prefix with subcode, so it may not work in that edge case (44,44.01). I have it so that it ignores the prefix without subcode in that edge case.

List<string> codes = new List<string>
{
    "44.01", "44.02", "44.03", "44.04", "44.05", "44.06", "44.07", "44.08", "46", "47.10"
};
var result=codes.Select(x => (x+".").Split('.'))
                   .Select(x => new
                   {
                       Prefix = x[0],
                       Subcode = x[1]
                   })
                   .GroupBy(k => k.Prefix)
                   .Select(g => new
                   {
                       Prefix = g.Key,
                       Subcodes = g.Where(s => s.Subcode!="").Select(s => s.Subcode)
                   })
                   .Select(x =>
                       x.Prefix +
                       (x.Subcodes.Count() == 0 ? string.Empty :
                        string.Format(x.Subcodes.Count()>1?"({0})":".{0}",
                         string.Join(",", x.Subcodes)))
                   ).ToArray();

Upvotes: 1

Graffito
Graffito

Reputation: 1718

The old fashioned way:

List<string> codes = new List<string>() {"44.01", "44.05", "47", "42.02", "44.03" };
string output="" 
for (int i=0;i<list.count;i++)
{
  string [] items= (codes[i]+"..").split('.') ;
  int pos1=output.IndexOf(","+items[0]+"(") ;
  if (pos1<0) output+=","+items[0]+"("+items[1]+")" ; // first occurence of code : add it
  else
  { // Code already inserted : find the insert point
    int pos2=output.Substring(pos1).IndexOf(')') ;   
    output=output.Substring(0,pos2)+","+items[1]+output.Substring(pos2) ;
  }
}
if (output.Length>0) output=output.Substring(1).replace("()","") ;

Upvotes: 1

RMalke
RMalke

Reputation: 4094

You can do it all in one clever LINQ:

var grouped = codes.Select(x => x.Split('.'))
                   .Select(x => new
                   {
                       Prefix = int.Parse(x[0]),
                       Subcode = x.Length > 1 ? int.Parse(x[1]) : (int?)null
                   })
                   .GroupBy(k => k.Prefix)
                   .Select(g => new
                   {
                       Prefix = g.Key,
                       Subcodes = g.Where(s => s.Subcode.HasValue).Select(s => s.Subcode)
                   })
                   .Select(x =>
                       x.Prefix +
                       (x.Subcodes.Count() == 1 ? string.Format(".{0}", x.Subcodes.First()) :
                        x.Subcodes.Count() > 1 ? string.Format("({0})", string.Join(",", x.Subcodes))
                                                : string.Empty)
                   ).ToArray();
  1. First it splits by Code and Subcode
  2. Group by you Code, and get all Subcodes as a collection
  3. Select it in the appropriate format

Looking at the problem, I think you should stop just before the last Select and let the data presentation be done in another part/method of your application.

Upvotes: 1

user2371524
user2371524

Reputation:

Outlined idea:

  • Use Dictionary<string, List<string>> for collecting your result

  • in a loop over your list, use string.split() .. the first element will be your Dictionary key ... create a new List<string> there if the key doesn't exist yet

  • if the result of split has a second element, append that to the List

  • use a second loop to format that Dictionary to your output string

Of course, linq is possible too, e.g.

List<string> codes = new List<string>() {
    "44.01", "44.05", "47", "42.02", "44.03" };

var result = string.Join(",",
    codes.OrderBy(x => x)
    .Select(x => x.Split('.'))
    .GroupBy(x => x[0])
    .Select((x) =>
    {
        if (x.Count() == 0) return x.Key;
        else if (x.Count() == 1) return string.Join(".", x.First());
        else return x.Key + "(" + string.Join(",", x.Select(e => e[1]).ToArray()) + ")";
    }).ToArray());

Gotta love linq ... haha ... I think this is a monster.

Upvotes: 1

ManEatingCheese
ManEatingCheese

Reputation: 236

You could go a couple ways... I could see you making a Dictionary<string,List<string>> so that you could have "44" map to a list of {".01", ".02", ".03", etc.} This would require you processing the codes before adding them to this list (i.e. separating out the two parts of the code and handling the case where there is only one part).

Or you could put them into a a SortedSet and provide your own Comparator which knows that these are codes and how to sort them (at least that'd be more reliable than grouping them alphabetically). Iterating over this SortedSet would still require special logic, though, so perhaps the Dictionary to List option above is still preferable.

In either case you would still need to handle a special case "46" where there is no second element in the code. In the dictionary example, would you insert a String.Empty into the list? Not sure what you'd output if you got a list {"46", "46.1"} -- would you display as "46(null,1)" or... "46(0,1)"... or "46(,1)" or "46(1)"?

Upvotes: 0

Habib
Habib

Reputation: 223247

You can do:

var query = codes.Select(c => 
    new
    {
        SplitArray = c.Split('.'),  //to avoid multiple split
        Value = c
    })
    .Select(c => new
    {
        Prefix = c.SplitArray.First(), //you can avoid multiple split if you split first and use it later
        PostFix = c.SplitArray.Last(),
        Value = c.Value,
    })
    .GroupBy(r => r.Prefix)
    .Select(grp => new
    {
        Key = grp.Key,
        Items = grp.Count() > 1 ? String.Join(",", grp.Select(t => t.PostFix)) : "",
        Value = grp.First().Value,
    });

This is how it works:

  • Split each item in the list on the delimiter and populate an anonymous type with Prefix, Postfix and original value
  • Later group on Prefix
  • after that select the values and the post fix values using string.Join

For output:

foreach (var item in query)
{
    if(String.IsNullOrWhiteSpace(item.Items))
        Console.WriteLine(item.Value);
    else
        Console.WriteLine("{0}({1})", item.Key, item.Items);
}

Output would be:

44(01,02,03,04,05,06,07,08)
46
47.10

Upvotes: 5

maraaaaaaaa
maraaaaaaaa

Reputation: 8163

General idea, but i'm sure replacing the Substring calls with Regex would be a lot better as well

List<string> newCodes = new List<string>()
foreach (string sub1 in codes.Select(item => item.Substring(0,2)).Distinct)
{
    StringBuilder code = new StringBuilder();
    code.Append("sub1(");
    foreach (string sub2 in codes.Where(item => item.Substring(0,2) == sub1).Select(item => item.Substring(2))
        code.Append(sub2 + ",");
    code.Append(")");
    newCodes.Add(code.ToString());
}

Upvotes: 0

Related Questions