Reputation: 851
Do you have any suggestions for my code below that you might have to improve the performance? This is .NET 2.0 framework, and the DataTable gets bound to a DataGridview. The data is loaded into the Datatable via .ReadXML() it doesn't come from a database. There can be any where from 80 to 100k of records. The Terms() array is what the user passed it for the search. So "bob taco" would be terms(0) = "bob" , terms(1) = "taco". I have a regex that maintains any quoted terms. So "bob taco" would be on one element of the array. Is there a better way? I tried using Dataview since that has better performance but it didn't look like I could use the LIKE operator. Any suggestions are welcome, I'd really like to speed this up a bit.
Public Function Search(ByVal Terms() As String, ByRef ResidentTBL As DataTable) As DataTable
'Dim t As Long = Now.Ticks
Dim j As Integer
Dim newdt As New DataTable("Users")
Dim newtable As New DataTable
newtable = ResidentTBL.Clone
Dim termsceiling As Integer
termsceiling = Terms.GetUpperBound(0)
Dim filterstr As String = String.Empty
Dim foundrows() As DataRow
Dim sortOrder As String = "displayname ASC"
Dim tempstr As String
For j = 0 To termsceiling
'remedy any invalid sql characters
tempstr = Terms(j).Trim.ToUpper
tempstr = tempstr.Replace("'", "''")
tempstr = tempstr.Replace("*", "")
tempstr = tempstr.Replace("%", "")
'assemble the sql query
filterstr = filterstr & _
"((column1 LIKE '" & tempstr & "%') OR " & _
"(column2 LIKE '" & tempstr & "%') OR " & _
"(column3 LIKE '" & tempstr & "%') OR " & _
"(column4 LIKE '" & tempstr & "%') OR " & _
"(column5 LIKE '" & tempstr & "%') OR " & _
"(column6 LIKE '" & tempstr & "%') OR " & _
"(column7 LIKE '" & tempstr & "%') OR " & _
"(column8 LIKE '" & tempstr & "%') OR " & _
"(column9 LIKE '" & tempstr & "%') OR " & _
"(column10 LIKE '" & tempstr & "%'))"
'if there are further iterations append an AND (user typed more than one term)
If termsceiling > 0 And j <> termsceiling Then
filterstr = filterstr & " AND "
End If
Next j
filterstr = "(" & filterstr & ")" 'wrap the entire query
foundrows = ResidentTBL.Select(filterstr, sortOrder)
For i = 0 To foundrows.Length - 1
newtable.ImportRow(foundrows(i))
Next i
newdt = newtable
'Begin Debugging Code:
't = Now.Ticks - t
'MessageBox.Show("Took " & (t / 10000000) & " seconds.")
'End Debugging Code:
Return newdt
End Function
Upvotes: 1
Views: 1065
Reputation: 100627
Have you considered converting your search algorithms to use LINQ to XML? This will save you the heaviness of the DataTable.
You could simply load your XML file into an XmlDocument and perform LINQ queries on the elements in the in-memory XML object. You can find more questions and answers in the Linq-To-Xml tag page. Perhaps start with LINQ To XML: How Does It Work?
There's an extension method that will really help you: Contains()
If you want to continue with the DataTable implementation, here's more info on using LINQ with DataTables.
Upvotes: 0
Reputation: 1541
I would suggest not creating a filter, but instead doing something like
foreach (DataRow row in ResidentTBL.Rows)
if (IsMatch(row))
newtable.ImportRow(row);
where the IsMatch method implements the logic in your filter. This should give you a little more fine tuned control. DataTable fitlers are designed to be generic, so they use a mashup of keys style algorithm. That is not always the most optimal way to find a record.
Note, my example is C#, you'll need to adjust for VB, but I am not as verse in.
Upvotes: 1
Reputation: 84813
I would do this differently on two accounts: firstly, I would run the query on the database and fill the datatable only with the correct (filtered) data.
Secondly, this way of using LIKE will go every time through all the records, making it slow if you have lots of users. What I end up doing in this situation is to implement a "poor-man search engine" which basically parses each text from those fields into words and inserts in another table each of the word with the corresponding user ID. That table can be indexed and the search will be directly with '=' and not 'LIKE' making it much faster.
Edit: Not having a database complicates things. Especially since you have a lot of data and I don't know if there is any indexing or optimization when searching like this. If you have a method of caching the datatable between requests you could build another datatable with parsed data. As long as the user is searching for the same kind of tokens it should work, but in order to find "bob taco" as two words one next to the other you need to save the position of the words when you parse the data and search accordingly (it makes it a little more complicated).
For example:
ID, Text
1, Hangs out at Bob Taco joint
2, Hates Bob and his taco
Would give something like this:
ID, Key, Pos
1, Hangs, 1
1, out, 2
1, at, 3
1, Bob, 4
1, Taco, 5
1, joint, 6
2, Hates, 1
2, Bob, 2
2, and, 3
2, his, 4
2, taco, 5
So now you need to search for the IDs that contain both bob and taco and the difference between they Pos values should be 1. For example ID 2 shouldn't be found since the Pos are 2 and 5.
I did this using temporary tables in SQL. If you need to work only in memory it might get harder.
Upvotes: 0
Reputation: 2950
I have checked few issues and decided to rewrite my answer to be more accurate. Look at string handling. Every time you assign a string new value, a brand new string is created. Note: you are doing many big string operations in a loop. First, you take value and do some replacing. Then you make a big concatenation (“like” section). Try to use String.Format or – much, much better – use StringBuilder class. String operations can extremely degrade performance.
Upvotes: 3