Reputation: 3411
I have a spreadsheet that's used to connect to my mySQL database and make changes to that database. I think the way I'm checking for duplicates in the system is very inefficient, but I'm having a hard time thinking of a more efficient way of doing this. The time it takes to go item by item isn't that long with 100 items, but it's VERY long with 50,000 items, and I'm trying to find a way to shorten the amount of time this process takes.
'/ Define last row with data in it
lastRow = Range("C" & Rows.Count).End(xlUp).Row
'/ go through each row, line by line, to check if this part number is in the system
For c = 5 To lastRow Step 1
'/ Based on user input, runs a SELECT query to see if the part number in question is already in the table they're trying to upload into
SQLStr = "SELECT quotePartNumber FROM " & table & " where (quotePartNumber, `debitNumber) IN (('" & partNumber & "', '" & debitNum & "')) LIMIT 1"`
rs.Open SQLStr, conn, adOpenStatic
' Dump to spreadsheet
With Worksheets("DATA").Range("N28:N28")
.ClearContents
'/ copy SQL output to cell N28, then check if an output exists. If the part exists in the table already, a part number will fill N28, if the part doesn't exist, N28 will be blank.
.CopyFromRecordset rs
End With
'/ If the part already exists, delete this part number from the spreadsheet.
pnCheck = Range("N28").Value
If pnCheck <> "" Then
Range(Cells(c, 1), Cells(c, 11)).Select
Selection.Delete Shift:=xlUp
deleted = deleted + 1
c = c - 1
End If
'/ reset rs for the next run through
rs.Close
Set rs = Nothing
Range("N28") = ""
debit = False
next c
So basically, I go through each line in the spreadsheet, check to see if that part exists already in my table, and if it does, I delete that line from the spreadsheet. Once the list is whittled down to what's NOT in the table already, I upload it (that code is not shown here...)
Is there a more efficient way of checking anywhere from 5,000 to 100,000 part numbers against a table than going line by line?
Upvotes: 0
Views: 907
Reputation: 965
Sorry, misunderstood initially.
How does mySQL respond if you try to upload the table with duplicate IDs? The field should be defined as requiring unique values if that's what it needs, and I think MySQL would just throw a warning and dump the offending rows.
That aside, this is something Excel can do faster than you repeatedly querying, writing, deleting, and updating. Can you dump ALL the table's IDs to your spreadsheet in a different sheet? Then you could use Excel's countif formula and delete anything with more than 1. I'm on a phone so it's hard to be specific but have a countif next to each spreadsheet ID, and count if it's in the range of mySQL IDs. Then filter for > 1 and delete. You could also dump the ids into one column, use an adjacent column to track what was spreadsheet and what was db, use excel's remove duplicates, then delete the db entries. All that's left are the uniques spreadsheet entries.
In general, for performance, selection and modification is slow. Instead of selecting a range and deleting the selection, delete the range outright. Deleting and shifting is slow too, so maybe just flag the offending rows in an adjacent column, then delete them all at once. Also, turn off screenupdating, application.screenupdating = false. Updating the screen is slow too.
Upvotes: 1