Reputation: 460108
Background:
I'm importing data from a MySQL database into a SQL-Server database(for reports and later a SSAS-Cube). I want to normalize the data at the same time. I want to group repeating Ticket_IDs
to one record in a table Contact
with other useful informations and leave the rawdata in the sub-table ContactDetail
(with foreignkey to Contact). Hence every record in Contact
has a unique Ticket_ID
.
I've decided to use strong typed datasets for the import. Now i'm wondering what's the best way to detect if i've already added a Ticket_ID. I could check for it in every loop(~100000 records) but i'm assuming that there is a better/faster way.
Simplified sample-data:
Ticket_ID ID fiContact
89442226 1 1
89442226 2 1
89442226 3 1
89442261 4 2
89442261 5 2
89442354 6 3
89442359 7 4
89442359 8 4
89442367 9 5
89442504 10 6
This should be the Contact
-table
Ticket_ID idContact
89442226 1
89442261 2
89442354 3
89442359 4
89442367 5
89442504 6
Question:
Is it possible with LINQ/LINQ-to-DataSet to group by Ticket_ID and get a list of ContactDetailRows for every ContactRow? I know there is a GroupBy-Extension, but i'm unsure how to use and if it does what i need(keep the ContactDetail-Rows, f.e. like a dicitonary with Ticket_ID as key and a List(of EmailRow)
as value).
This is what i have(simplified):
For Each srcEmail In src.email 'i want to group src.email by Ticket_ID'
'so far i check for existence in every loop'
Dim res = From c In dest.Contact
Where c.Ticket_ID = srcEmail.ticket_id
If Not res.Any Then
'create new Contact
Dim newContact = Me.dest.Contact.NewContactRow
newContact.Ticket_ID = srcEmail.ticket_id
' ..... '
dest.Contact.AddContactRow(newContact)
End If
'TODO: create ContactDetail row and add it to the DataTable '
Next
src
: typed DataSet(MySQL)src.email
: typed DataTable => into ContactDetail
dest
: typed DataSet(SQL-Server)dest.Contact
typed DataTabledest.ContactDetail
typed DataTable with fk to Contact
I would prefer VB.NET because i'm yet not as familiar with LINQ and the Syntax is quite different in C#.
Edit:
Thanks to @Magnus i've get it going in the following way:
Dim emailsPerTicketID = src.email.ToLookup(Function(email) email.ticket_id)
For Each ticket In emailsPerTicketID
'create new Contact
Dim newContact = Me.dest.Contact.NewContactRow
newContact.Ticket_ID = ticket.Key
newContact.CreatedAt = ticket.First().modified_time
' ...... '
dest.Contact.AddContactRow(newContact)
'TODO: add now all EmailRows'
For Each emailRow In ticket
Dim newContactDetail = dest.ContactDetail.NewContactDetailRow
newContactDetail.ContactRow = newContact
newContactDetail.Interaction = emailRow.interaction
' .... '
dest.ContactDetail.AddContactDetailRow(newContactDetail)
Next
Next
I will have a look if this is faster than the iterating approach with a HashSet to detect if the contact was already created.
Upvotes: 1
Views: 547
Reputation: 2233
My VB is rusty, but here's a shot at it:
Dim ticketGroups = From c in dest.Contact
Group c By Ticket_ID = c.Ticket_ID
Into Tickets = Group
For Each ticketGroup In ticketGroups
For Each ticket in ticketGroup.Tickets
' Create the row, add it, etc.
Dim newContact = Me.dest.Contact.NewContactRow
newContact.Ticket_ID = ticketGroup.Ticket_ID
' .... '
dest.Contact.AddContactRow(newContact)
Next
Next
Alternatively, if you want to check it every time through the loop, you could use a HashSet, just adding the ticket ID to the hashset each time through and then checking for its presence via the Contains
method. That would be faster than what you're doing, but I suspect the LINQ grouping will be faster than the HashSet.
Upvotes: 1
Reputation: 46929
I think using Lookup(like a dictionary but with key/Collection instead) would be a good solution for you. something like this:
var lookup = ds.Tables["src"].AsEnumerable().ToLookup(x => x.Field<int>("Ticket_ID"));
foreach (var row in ds.Tables["dest"].AsEnumerable())
{
if(!lookup.Contains(row.Field<int>("Ticket_ID ")))
{
//create new Contact
}
else
{
//do other struff
}
}
If you need any help translating any of the syntax to VB comment me.
Upvotes: 1