Reputation: 9538
In this thread [Link}(Scraping table from local HTML with unicode characters), QHarr has helped me to scrape a table from local html file. I have a html file at this Link
And I used the same code and edited a little for the variables 'startTableNumber' and 'endTableNumber' and 'numColumns'
Public Sub Test()
Dim fStream As ADODB.Stream, html As HTMLDocument
Set html = New HTMLDocument
Set fStream = New ADODB.Stream
With fStream
.Charset = "UTF-8"
.Open
.LoadFromFile "C:\Users\Future\Desktop\Sample 2.html"
html.body.innerHTML = .ReadText
.Close
End With
Dim hTables As Object, startTableNumber As Long, i As Long, r As Long, c As Long
Dim counter As Long, endTableNumber, numColumns As Long
startTableNumber = 91
endTableNumber = 509
numColumns = 14
Set hTables = html.getElementsByTagName("table")
r = 2: c = 1
For i = startTableNumber To endTableNumber Step 2
counter = counter + 1
If counter = 10 Then
c = 1: r = r + 1: counter = 1
End If
Cells(r, c) = hTables(i).innerText
c = c + 1
Next
End Sub
But I got scattered data of the table further more I would like to find a flexible way so as to make the code recognize those variables without assigning them manually I hope to find solution using selenium. Hope also not to receive negative rep. I have done my best to clarify the issue Regards
Upvotes: 1
Views: 193
Reputation: 84465
So, as I said in my comments you need to study how the data appears in the later table tags and perform a mapping to get the correct ordering. The following writes out the table. As I also mentioned, this is not robust and only the methodology may possibly be transferable to other documents.
In your case you wouldn't be reading from file but would use
Set tables = driver.FindElementsByCss("table[width='100%'] table:first-child")
You would then For Each
over the web elements in the collection adjusting the syntax as required e.g. .Text
instead of .innerText
. There may be a few other adaptations for selenium due to its indexing of webElements but everything you need to should be evident below.
VBA:
Option Explicit
Public Sub ParseInfo()
Dim html As HTMLDocument, tables As Object, ws As Worksheet, i As Long
Set ws = ThisWorkbook.Worksheets("Sheet2")
Dim fStream As ADODB.Stream
Set html = New HTMLDocument
Set fStream = New ADODB.Stream
With fStream
.Charset = "UTF-8"
.Open
.LoadFromFile "C:\Users\User\Desktop\test.html"
html.body.innerHTML = .ReadText
.Close
End With
Set tables = html.querySelectorAll("table[width='100%'] table:first-child")
Dim rowCounter: rowCounter = 2
Dim mappings(), j As Long, headers(), arr(13)
headers = Array("Notes", "Type", "Enrollment status", "Governorate of birth", "Year", "Month", "Day", "Date of Birth", "Religion", _
"Nationality", "Student Name", "National Number", "Student Code", "M")
mappings = Array(3, 8, 9, 12, 11, 10, 2, 7, 1, 6, 5, 4, 13)
ws.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
For i = 89 To 504 Step 26
arr(0) = vbNullString
For j = 0 To 12
arr(mappings(j)) = tables.item(2 * j + i).innerText
Next
ws.Cells(rowCounter + 1, 1).Resize(1, UBound(arr) + 1) = arr
rowCounter = rowCounter + 1
Next
End Sub
Upvotes: 1