ConnorU
ConnorU

Reputation: 1409

Linq query returning duplicates

So I have two DataTables, both taken from the same ODBC data source. One has rows for sales (Estado='VENTAS') and another for Estado='EXIST' (the database creates row pairs for each sale, one indicates the sale information and the other the information about stock movements off the store's deposit, so in order to know what store made a sale, I have to match each VENTAS row to the corresponding EXIST row [by the 'Documento' ID field] and get the "Deposito" value off the Exist row).

I built this query to do that (the ODBC driver I'm using doesn't allow for joins so doing it in Linq was the next best thing)

            DataTable dtResult = new DataTable();

            dtResult.Columns.Add("Documento", typeof(Int64)); 
            dtResult.Columns.Add("Fecha", typeof(DateTime));
            dtResult.Columns.Add("Articulo", typeof(string));
            dtResult.Columns.Add("Deposito", typeof(string));
            dtResult.Columns.Add("ImpDMn", typeof(decimal));
            dtResult.Columns.Add("Cantidad", typeof(decimal));
            dtResult.Columns.Add("Partida", typeof(string));

            var result = from dataRows1 in VentasDT.AsEnumerable() //VentasDT is the DataTable with the VENTAS data
                         join dataRows2 in ExistDT.AsEnumerable() // ExistDT is the DataTable with the EXIST data
                         on dataRows1.Field<Int64>("Documento") equals dataRows2.Field<Int64>("Documento")
                         select dtResult.LoadDataRow(new object[]
         {
            dataRows1.Field<Int64>("Documento"),
            dataRows1.Field<DateTime>("Fecha"),
            dataRows1.Field<string>("Articulo"),
            dataRows2.Field<string>("Deposito"),
            dataRows1.Field<decimal>("ImpDMn"),
            dataRows1.Field<decimal>("Cantidad"),
            dataRows1.Field<string>("Partida"),
          }, false);
            result.CopyToDataTable();

and it works but for some reason it duplicates some rows. I checked the rows that were duplicated and they are identical to each other. Their Document numbers are equal (and there is only one VENTAS and one EXIST row per Documento number, including for these duplicates) so I don't know why they're getting duplicated. I checked on the source Datatables I'm joining and likewise, there is only one corresponding line in each for the duplicates. The EXIST table is larger than the VENTAS table (it also records transfers between stores that aren't sales) but the joined table is larger than either (yet not as large as both combined)

VENTAS: 85123
EXIST: 116857
JOIN: 141287

Is there something wrong with my query? Should I include some GROUP BY clause or .Distinct or should this structure (which as far as I've seen should mimic a left join) have been enough?

Upvotes: 1

Views: 958

Answers (1)

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726569

Since the only reason that you are doing this join is to get the value of the "Deposito" fields from ExistDT, you can replace joining with a dictionary lookup. Here is how you can do it:

var depositoPorDocumento = ExistDT.AsEnumerable().ToDictionary(
    dr => dr.Field<Int64>("Documento"),
    dr => dr.Field<string>("Deposito")
);
foreach (var vr in VentasDT.AsEnumerable()) {
    Int64 id = Field<Int64>("Documento");
    string deposito;
    if (!depositoPorDocumento.TryGetValue(id, out deposito)) {
        continue;
    }
    dtResult.LoadDataRow(new object[]
     {
        id,
        vr.Field<DateTime>("Fecha"),
        vr.Field<string>("Articulo"),
        deposito,
        vr.Field<decimal>("ImpDMn"),
        vr.Field<decimal>("Cantidad"),
        vr.Field<string>("Partida"),
      }, false);
}
result.CopyToDataTable();

Upvotes: 1

Related Questions