Reputation: 5986
i have 2 datatables each datatable has 300,000 rows (datatables imported from 2 excel worksheets with OLEDB).
the first datatable is 'dtTosearch' and the second datatable is 'sourceDt'.
here is an example for the 2 tables:
i need to find for each row in 'untagged' column (sourceDt) a match to each row in 'token' column (dtTosearch). the match conditions are:
ive added the relevant parts of the code, it works fine but not in the way i want, i want to improve processing time, look at the foreach loop in linqQuery() function - i will be thankful if you help me to replace that loop by expanding my query to condition number 4, the loop operate for condition 4 because the result of the linq query ordered by 'token' length in descending order so it will exit and return the result with the maximum row length.
Private Sub startScanning()
Dim siteNum As Double
Dim categoryNum As Double
Dim stringToSearchin As String
For i = 0 To sourceDt.Rows.Count - 1
siteNum = sourceDt.Rows(i).Item(0)
categoryNum = sourceDt.Rows(i).Item(1)
stringToSearchin = sourceDt.Rows(i).Item(3)
Debug.WriteLine( linqQuery(siteNum, categoryNum, stringToSearchin) & " " &
stringToSearchin)
Next
End Sub
Private Function linqQuery(ByVal sitenum As Double, ByVal cat As Double,
ByVal stringToSearchIn As String) As String
Dim query = From row In dtTosearch
Where (row.Field(Of Double?)("site") = sitenum And
row.Field(Of Double?)("category") = cat)
Order By row.Field(Of String)("token").Length Descending
Select New With {
.token = row.Field(Of String)("token")
}
For Each x In query
If stringToSearchIn.Contains(x.token) Then
Return x.token
Exit Function
End If
Next
Return ""
End Function
Thank you very much for your time and consideration!
Upvotes: 0
Views: 1126
Reputation: 109079
In C# code you can find the records you want by
from r1 in sourceDt.AsEnumerable()
join r2 in dtToSearch.AsEnumerable()
on new { p1 = r1.Field<int>("cat"), p2 = r1.Field<int>("Site") }
equals new { p1 = r2.Field<int>("category"), p2 = r2.Field<int>("site") }
into r2g
select new
{
r1,
p2 = (from r2 in r2g
let token = r2.Field<string>("token")
where token.Contains(r1.Field<string>("untagged"))
orderby token.Length descending, token
select r2).FirstOrDefault()
}
That is, first group join the datatables on the fields they must have in common. The join - into
syntax makes this a group join as explained here.
Then, within a group of matching rows, the rows containing the value of untagged
in their token
s are filtered, and the one with the longest token is returned (also alphabetically the first one, in case there are ties on length).
Upvotes: 5