Reputation: 183
I'm trying to find a distinct list of filenames related to each bugid, and I used linq to group all filenames related to each bug id. I don't know how I can remove duplicate filenames related to each bugid,in file ouput I have multiple rows like this: bugid filename1 filename2 filename3 filename4 ............. there are multiple rows with the same bugid and also there duplicate filenames for each bug id, this is my code:
using System;
using System.Collections.Generic;
using System.Text;
using System.Linq;
namespace finalgroupquery
{
class MainClass
{
public static void Main (string[] args)
{
List <bug> list2=new List <bug> ();
using(System.IO.StreamReader reader1= new System.IO.StreamReader( @"/home/output"))
using (System.IO.StreamWriter file = new System.IO.StreamWriter( @"/home/output1"))
{string line1;
while ((line1=reader1.ReadLine())!=null)
{ string[] items1=line1.Split('\t');
bug bg=new bug();
bg.bugid=items1[0];
for (int i=1; i<=items1.Length -1;i++)
{ bg.list1.Add(items1[i]);}
list2.Add(bg);
}
var bugquery= from c in list2 group c by c.bugid into x select
new Container { BugID = x.Key, Grouped = x };
foreach (Container con in bugquery)
{
StringBuilder files = new StringBuilder();
files.Append(con.BugID);
files.Append("\t");
foreach(var x in con.Grouped)
{
files.Append(string.Join("\t", x.list1.ToArray()));
}
file.WriteLine(files.ToString()); }
}
}
}
public class Container
{
public string BugID {get;set;}
public IGrouping<string, bug> Grouped {get;set;}
}
public class bug
{
public List<string> list1{get; set;}
public string bugid{get; set;}
public bug()
{
list1=new List<string>();
}
}
}
}
Upvotes: 1
Views: 1599
Reputation: 1907
Try to use this code :
var bugquery = from c in list2
group c by c.bugid into x
select new bug { bugid = x.Key, list1 = x.SelectMany(l => l.list1).Distinct().ToList() };
foreach (bug bug in bugquery)
{
StringBuilder files = new StringBuilder();
files.Append(bug.bugid);
files.Append("\t");
files.Append(string.Join("\t", bug.list1.ToArray()));
file.WriteLine(files.ToString());
}
Thanks to the combination of SelectMany
and Distinct
Linq operators, you can flatten the filename list and delete duplicates in a single line.
SelectMany (from msdn):
Projects each element of a sequence to an IEnumerable and flattens the resulting sequences into one sequence.
Distinct (from msdn):
Returns distinct elements from a sequence.
It also means that your Container
class is no longer needed as there's no need to iterate through the IGrouping<string, bug>
collection anymore (here list1
contains all the bug related filenames without duplicates).
Edit
As you may have some blank lines and/or empty strings after reading and parsing your file, you could use this code to get rid of them :
using (System.IO.StreamReader reader1 = new System.IO.StreamReader(@"/home/sunshine40270/mine/projects/interaction2/fasil-data/common history/outputpure"))
{
string line1;
while ((line1 = reader1.ReadLine()) != null)
{
if (!string.IsNullOrWhiteSpace(line1))
{
string[] items1 = line1.Split(new [] { '\t' }, StringSplitOptions.RemoveEmptyEntries);
bug bg = new bug();
bg.bugid = items1[0];
for (int i = 1; i <= items1.Length - 1; i++)
{
bg.list1.Add(items1[i]);
}
list2.Add(bg);
}
}
}
You'll notice :
line1
are checked for emptyness as soon as they are retrieved from your stream (with !string.IsNullOrWhiteSpace(line1)
)string.Split
method, you can use the StringSplitOptions.RemoveEmptyEntries
parameter.Hope this helps.
Upvotes: 1
Reputation: 4777
From your description it sounds like you want to do this:
List <bug> bugs = new List<bug>();
var lines = System.IO.File.ReadLines(@"/home/bugs");
foreach (var line in lines) {
string[] items = line.Split('\t');
bug bg=new bug();
bg.bugid = items[0];
bg.list1 = items.Skip(1).OrderBy(f => f).Distinct().ToList();
bugs.Add(bg);
}
This will produce a list of objects, where each object has a unique list of filenames.
Upvotes: 1