10101
10101

Reputation: 2402

Linq query where first column from DataTable is compared to second column for duplicates

I have merged DataTable that consist duplicates in both columns.

DataTable structure: "Worknumber Username"

1234 John
1235 Mike
1235 Mike
1236 Donald
1236 Jack

I need to catch

1236 Donald
1236 Jack

and preferably output it in format

1236 Donald Jack

I have been trying to create Linq query for this purpose for some time now (quite long), but I can't figure out the right way.

Here is what I have so far. This works, I just tested that column names are correct with dataRow.Field<string>("WORKNUMBER"):

        public static DataTable ProjectsInLiinos = new DataTable();

        ...

        MainProcess.MergedTable();

        foreach (DataRow dataRow in MainProcess.ProjectsInLiinos.Rows)
        {
                Console.WriteLine(dataRow.Field<string>("WORKNUMBER") + dataRow.Field<string>("USERNAME"));
        }

Here is Linq query:

var queryCustomers = from data in MainProcess.ProjectsInLiinos.AsEnumerable()
                                 where data.Field<string>("WORKNUMBER").Count() > 1 && data.Field<string>("USERNAME").Count() < 1
                                 select data;

It does not produce anything. I can't figure out how I can make statement where first column duplicates are compared to second column duplicates?

Otherwise I could have grouped dataRow.Field<string>("WORKNUMBER") + dataRow.Field<string>("USERNAME") and then check list for duplicates, but it will produce then:

1235Mike
1235Mike

and in reverse

1234John
1236Donald
1236Jack

Upvotes: 2

Views: 1136

Answers (5)

Roman Ryzhiy
Roman Ryzhiy

Reputation: 1646

Assuming you have

class MyClass : IEquatable<MyClass> // IEquatable is required to perform Dictinct()
{
    public string Worknumber { get; set; }
    public string Username { get; set; }

    public bool Equals(MyClass other)
    {
        if (this.Username != other.Username)
            return false;

        if (this.Worknumber != other.Worknumber)
            return false;

        return true;
    }

    public override int GetHashCode() // required for objects comparison
    {
        int hashWorknumber = Worknumber == null ? 0 : Worknumber.GetHashCode();
        int hashUsername = Username == null ? 0 : Username.GetHashCode();

        return hashWorknumber ^ hashUsername;
    }
}

and a mock like this

List<MyClass> myclasses = new List<MyClass>();

myclasses.Add(new MyClass { Worknumber = "1234", Username = "John" });
myclasses.Add(new MyClass { Worknumber = "1235", Username = "Mike" });
myclasses.Add(new MyClass { Worknumber = "1235", Username = "Mike" });
myclasses.Add(new MyClass { Worknumber = "1236", Username = "Donald" });
myclasses.Add(new MyClass { Worknumber = "1236", Username = "Jack" });

we make a simple lambda query

var result = myclasses
    .Distinct()
    .GroupBy(e => e.Worknumber) // grouping
    .Select(e => new
    {
        Worknumber = e.Key,
        Count = e.Count(),
        Users = string.Join(" ", e.Select(p => p.Username)) // joining names
    })
    .Where(e => e.Count > 1) // filtering duplicates only
    .ToList();

Upvotes: 1

Mong Zhu
Mong Zhu

Reputation: 23732

as already suggested use grouping by the first column first. Then you can find the different usernames by a second grouping in the Where filter:

void Main()
{
    string column1 = "Worknumber";
    string column2 = "Username";
    DataTable table = new DataTable();
    table.Columns.Add(column1, typeof(int));
    table.Columns.Add(column2, typeof(string));

    table.Rows.Add(1234, "john");
    table.Rows.Add(1235, "Mike");
    table.Rows.Add(1235, "Mike");
    table.Rows.Add(1236, "Donald");
    table.Rows.Add(1236, "Jack");
    table.Rows.Add(1237, "Val");
    table.Rows.Add(1237, "Nick");
    var result = table.AsEnumerable()
         .GroupBy(x => x.Field<int>(column1))
         .Where(g => g.Count() > 1 && g.GroupBy(x => x.Field<string>(column2)).Count() > 1)
         .Select(x => new {WorkNumber = x.Key, AllUsers = string.Join(" ", x.Select(u => u.Field<string>(column2)))}).Dump();       

}

I used an anonymous type here, but you can easily write a custom class with an appropriate constructor to fit in the user name values.

Upvotes: 1

Cetin Basoz
Cetin Basoz

Reputation: 23797

You can use group by:

void Main()
{
    DataTable tbl = new DataTable();
    tbl.Columns.Add("Id", typeof(int));
    tbl.Columns.Add("Name", typeof(string));
    tbl.Rows.Add(1234, "John");
    tbl.Rows.Add(1235, "Mike");
    tbl.Rows.Add(1235, "Mike");
    tbl.Rows.Add(1236, "Donald");
    tbl.Rows.Add(1236, "Jack");

    var dupes = tbl.AsEnumerable()
    .GroupBy(t => t.Field<int>("Id"))
    .Where(x => x.Select(y => y.Field<string>("Name")).Distinct().Count() > 1)
    .Select(x => new
    {
        Id = x.Key,
        Names = string.Join(",", x.Select(y => y.Field<string>("Name")))
    });
    foreach (var dupe in dupes)
    {
        Console.WriteLine($"{dupe.Id}:{dupe.Names}");
    }
}

I think it would much better to use the source anyway, rather than a DataTable in between.

Upvotes: 1

Derviş Kayımbaşıoğlu
Derviş Kayımbaşıoğlu

Reputation: 30565

check this simple demo

public class C
{
    public int Worknumber { get; set; }
    public string Username { get; set; }

    public C(int worknumber, string username)
    {
        Worknumber = worknumber;
        Username = username;
    }
}

void Main()
{
    var lst = new List<C>(){
        new C(1234, "Mike"),
        new C(1235, "Mike"),
        new C(1235, "Mike"),
        new C(1236, "Donald"),
        new C(1236, "Jake")
    };

    var p = lst.GroupBy(t => t.Worknumber)
        .Select(t => new { Key = t.Key, l = t.Select(x => x.Username).ToList() })
        .Where(x => x.l.Distinct().Count() > 1)
        .Select(x => String.Format("{0} {1}", x.Key,  string.Join(' ', x.l)))
        .Dump();
}

Upvotes: 1

jdweng
jdweng

Reputation: 34421

Try following :

            DataTable dt = new DataTable();
            dt.Columns.Add("Worknumber",typeof(int));
            dt.Columns.Add("Username",typeof(string));
            
            dt.Rows.Add(new object[] {1234, "John"});
            dt.Rows.Add(new object[] {1235, "Mike"});
            dt.Rows.Add(new object[] {1235, "Mike"});
            dt.Rows.Add(new object[] {1236, "Donald"});
            dt.Rows.Add(new object[] {1236, "Jack"});

            var distinct = dt.AsEnumerable()
                .GroupBy(x => x.Field<int>("Worknumber"))
                .Select(x => new { workNumber = x.Key, user = x.Select(y => y.Field<string>("Username")).Distinct().ToList() })
                .Where(x => x.user.Count > 1)
                .Select(x => new { workNumber = x.workNumber, user = string.Join(" ", x.user) })
                .ToList();

Upvotes: 2

Related Questions