Reputation: 573
I am having trouble designing an approach for taking data from a CSV into business objects. I'm starting by parsing the CSV and getting each row into a DataTable and that is where my mental block starts.
I've got the following classes where APDistribution is considered a child of Voucher with a 1:Many relationship:
public class Voucher
{
public string GPVoucherNumber { get; set; }
public string VendorID { get; set; }
public string TransactionDescription { get; set; }
public string Title { get; set; }
public string DocNumber { get; set; }
public DateTime DocDate { get; set; }
public decimal PurchaseAmount { get; set; }
public IEnumerable<APDistribution> Distributions { get; set; }
}
public class APDistribution
{
public string AccountNumber { get; set; }
public decimal Debit { get; set; }
public decimal Credit { get; set; }
public string DistributionReference { get; set; }
}
My CSV looks like this. Several fields can repeat representing the Voucher transaction (Vendor, Title Invoice Number, Invoice Amount, etc), and some fields are the Distribution detail (Journal Account Code, Journal Amount).
I began by thinking I could use Linq to project onto my business objects but I'm failing to see how I can structure the query to do that in one pass. I find myself wondering if I can do one query to project into a Voucher collection, one to project into an APDistribution collection, and then some sort of code to properly associate them.
I started with the following where I am grouping by the fields that should uniquely define a Voucher, but that doesn't work because the projection is dealing with an anonymous type instead of the DataRow.
var vouchers =
from row in invoicesTable.AsEnumerable()
group row by new { vendor = row.Field<string>("Vendor Code"), invoice = row.Field<string>("Vendor Invoice Number") } into rowGroup
select new Voucher
{ VendorID = rowGroup.Field<string>("Vendor Code") };
Is this achievable without introducing complex Linq that a future developer (myself included) could have difficulty understanding/maintaining? Is there a simpler approach without Linq that I'm overlooking?
Upvotes: 0
Views: 320
Reputation: 3063
You are looking for a Linq join. See the documentation here for more greater depth.
Where it appears that you are running into trouble however, is that on your 2 objects you need something for the query to compare against, like maybe adding public string VendorID { get; set; }
to the APDistribution
class, if possible. I would assume that the CSV files would have something that ties an APDistribution
back to a Voucher
, so whatever it is, make sure it's in both classes so you can relate one to the other. The name doesn't need to be the same in both classes but it should be. More importantly is that you now have something that an equality comparer can use for the join operation.
Now personally, I don't like big gnarly queries if I can break them apart and make things easier. Too much to reason about all at once, and you've indicated that you agree. So my approach is to divide and conquer as follows.
First, run queries to project the CSV data into discrete objects, like so:
var voucherRows =
from row in invoicesTable.AsEnumerable()
Select New Voucher {
VendorID = row.Field<string>("Vendor Code")
// other properties to populate
};
and
var distributionRows =
from row in distributionsTable.AsEnumerable()
Select New APDistribution {
VendorID = row.Field<string>("Vendor Code"),
// other properties to populate
};
At this point you have 2 data sets that are related in domain terms but not yet associated in code. Now you can compose the queries together in the Join query and the join starts to look a lot easier, maybe something like:
var vouchers =
from row in voucherRows
join dist in distributionRows
on row.VendorId equals dist.VendorId
into distGroup
select new Voucher
{ VendorID = row.VendorID,
// other properties to populate
Distributions = distGroup.ToList()
};
You'll have to modify the queries to your needs, but this breaks them down into 3 distinct operations that are all designed to do 1 thing, thus easier to read, reason about, debug, and modify later. If you need to group the vouchers you can at this point, but this should get you moving. Also, if needed, you can add a validation step or other processing in between the initial CSV queries and the join and you don't have to rewrite your queries, with the exception of changing some input variable names on the join.
Also, disclaimer that I did NOT build these queries in an IDE before posting so you may have some typos or missed symbols to deal with, but I'm pretty sure I have it right. Sorry in advance if you find anything aggravating.
Upvotes: 1
Reputation: 187
While Linq can be cool and add efficiencies, it doesn't add value if you can't be sure the code is correct today, and can't understand it tomorrow. Maybe using Linq in this case is Premature Optimization.
Start with a non-Linq solution that is verifiably accurate without being needlessly inefficient, and then optimize later if performance becomes a problem.
Here's how I might tackle this:
var vouchers = new Dictionary<string, Voucher>();
foreach (DataRow row in invoicesTable)
{
string vendor = row.Field<string>("Vendor Code");
string invoice = row.Field<string>("Vendor Invoice Number");
string voucherKey = vendor + "|" + invoice;
if (!vouchers.ContainsKey(voucherKey))
{
vouchers.Add(voucherKey, new Voucher { VendorID = vendor, DocNumber = invoice });
}
vouchers[voucherKey].Distributions.Add(new APDistribution { AccountNumber = row.Field<string>("Journal Account Code") });
}
If this will be processing a large number of rows, you can tune this a bit by preallocating the Dictionary to an estimate of the number of unique vendors:
var vouchers = new Dictionary<string, Voucher>(invoicesTable.Rows.Count * 0.8);
Upvotes: 0
Reputation: 588
The general idea is:
invoicesTable
.AsEnumerable()
.GroupBy(x=> new { row.Field<string>("Vendor Code"), row.Field<string>("Vendor Invoice Number")})
.Select(grouping =>
new Voucher
{
VendorID = grouping.First().Field<string>("VendorId") /* and so on */
Distributions = grouping.Select(somerow => new redistribution {AccountNumber = somerow.Field<string>("AccountNumber") /* and so on */}
}
But this is not the most elegant way.
Upvotes: 1