Reputation: 771
I'm having an issue with extracting HTML-code from .HTM game reports. As you can see I'm splitting the HTML file using Split()
and it usually creates around 60,000 rows for this particular play-by-play report.
Now, for 99 games out of 100 this works perfectly fine, but every now and then (such as with this exact GameID) it for whatever reason starts putting out #N/A
after just a few thousand rows - very early into the document.
I've checked the .HTM in question and it looks just like any other document that I've extracted.
Here's the code snippet:
Dim ie As Object, doc As HTMLDocument
Dim Y1 As String, Y2 As String, GameID As String
Dim SourceCode As Worksheet, c As Range
Set ie = New InternetExplorer
Set SourceCode = Worksheets("Source Code")
Y1 = "2017"
Y2 = "2018"
GameID = "0003"
ie.navigate "http://www.nhl.com/scores/htmlreports/" & Y1 & Y2 & "/PL02" & GameID & ".HTM"
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
arr = Split(doc.DocumentElement.innerHTML, vbLf)
SourceCode.Range("A1").Resize(UBound(arr) + 1, 1).Value = Application.Transpose(arr)
And this is what happens (occasionally):
Does anyone have a suggestion for how to error handle this? I haven't got the slightest clue what could be causing it.
Upvotes: 2
Views: 262
Reputation: 21629
It might be easier to use Excel's built-in web query tools. (If you haven't used the feature before it might seem more complicated than it is.) Click the DATA menu, then "From Web", enter the site, and Excel will download all the organized data it can find there. The attached image shows the steps and results, and here is a link to more about Excel web queries.
To automate, you could record a macro of the steps to import a file, and then edit the macro as VBA to enter your custom dates, etc.
If you're somewhat comfortable with MS Access, it is better suited to scraping; I have a file I could pass on which I've used for scraping everything from Google to eBay (before I learned about API's!)
EDIT:
Oh! One of your problems: Your code doesn't set ie = nothing
when it's done, so each time you run the code, Excel opens another copy of Internet Explorer (hidden in the background). CTRL-ALT-DEL to see the Task Manager and I bet you have multiple copies of iexplore.exe running, sucking up all your memory, andcausing the crashes. Reboot to fix that and try again, or try another method (after rebooting).
This is another way to do it, more memory-friendly:
Sub nhl_test()
Dim ie As Object, doc As HTMLDocument
Dim Y1 As String, Y2 As String, GameID As String, html As String
Dim SourceCode As Worksheet, c As Range, y As Long, x As Long
Set ie = New InternetExplorer
Set SourceCode = Worksheets("Source Code")
Y1 = "2017"
Y2 = "2018"
GameID = "0003"
ie.navigate "http://www.nhl.com/scores/htmlreports/" & Y1 & Y2 & "/PL02" & GameID & ".HTM"
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
html = doc.DocumentElement.innerHTML 'put HTML into a variable
Set doc = Nothing 'free up memory
Set ie = Nothing 'free up memory, close IE
arr = Split(html, vbLf) 'now split the html into lines
For x = 1 To UBound(arr) ' x counts the line # in the file
If Trim(arr(x)) <> "" Then
y = y + 1 ' y counts the line # that isn't blank
SourceCode.Cells(y, 1).Value = arr(x) 'put line x into row y
End If
Next x
MsgBox "done (" & y & "rows)"
End Sub
...but it's still not a very organized set of data. What's your plan once you have the html loaded into a worksheet?
Upvotes: 0
Reputation: 19319
There's an undocumented limit on the Transpose
function at 65,536 items. There's some discussion here on MSDN and you can google 'transpose vba limit' and see some other threads.
So your files are coming in at around 60,000 rows and some may go over the 65,536 limit and it looks like you hit that limit with that GameID.
One workaround, is to create a second array e.g.
Dim arr2() As String <-- we will make this a 2d array and fill it from arr
'... later in the code
arr = Split(doc.DocumentElement.innerHTML, vbLf) '<-- get the array with your original code
ReDim Preserve arr2(1 To UBound(arr), 1 To 1) '<-- redim arr2 to be suitable for writing to a range
' now fill the array from arr - basically doing a "manual" transpose
Dim i As Long
For i = 1 To UBound(arr)
arr2(i, 1) = arr(i)
Next i
Full working code:
Option Explicit
Sub Foo()
Dim ie As Object, doc As HTMLDocument
Dim Y1 As String, Y2 As String, GameID As String
Dim SourceCode As Worksheet, c As Range
Dim arr As Variant, arr2() As String
Set ie = New InternetExplorer
Set SourceCode = Worksheets("Source Code")
Y1 = "2017"
Y2 = "2018"
GameID = "0003"
ie.navigate "http://www.nhl.com/scores/htmlreports/" & Y1 & Y2 & "/PL02" & GameID & ".HTM"
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
arr = Split(doc.DocumentElement.innerHTML, vbLf)
ReDim Preserve arr2(1 To UBound(arr), 1 To 1)
Dim i As Long
For i = 1 To UBound(arr)
arr2(i, 1) = arr(i)
Next i
SourceCode.Range("A1").Resize(UBound(arr), 1).Value = arr2
' see comment by ashleedawg on the other answer
Set ie = nothing
End Sub
Upvotes: 2