TCritical
TCritical

Reputation: 71

loop through page numbers when href contians doPostBack() in webpage

I need to scrape date on ever page by clicking page number present in the webpage below.

I have mentioned sample website which looks similar to my html webpage.

Sample web page is this Webpage.

Code i have is below:

Sub Test()
Dim IE As Object
Dim i As Long, strText As String
Dim y As Long, z As Long, wb As Excel.Workbook, ws As Excel.Worksheet
Dim myBtn As Object
Dim Table As Object, tbody As Object, datarow As Object, thlist As Object, trlist As Object

Set wb = Excel.ActiveWorkbook
Set ws = wb.ActiveSheet
Sheets("Data").Select

Set IE = CreateObject("InternetExplorer.Application")
my_url = webpage.com
With IE
    .Visible = True
    .navigate my_url
    Do Until Not IE.Busy And IE.readyState = 4
        DoEvents
    Loop
End With
Set doc = IE.document
y = 1
z = 1
Application.Wait Now + TimeValue("00:00:02")
Set tbody = IE.document.getElementsByTagName("table")(0).getElementsByTagName("tbody")(0)
Set thlist = tbody.getElementsByTagName("tr")(0).getElementsByTagName("th")
Dim ii As Integer
For ii = 0 To thlist.Length - 1
    ws.Cells(z, y).Value = thlist(ii).innerText
    y = y + 1
Next ii
Set datarow = tbody.getElementsByTagName("tr")
y = 1
z = 2
Dim jj As Integer
Dim datarowtdlist As Object
For jj = 1 To datarow.Length - 4
    Set datarowtdlist = datarow(jj).getElementsByTagName("td")
    Dim hh As Integer, x As Integer
    x = y
    For hh = 0 To datarowtdlist.Length - 1
        ws.Cells(z, x).Value = datarowtdlist(hh).innerText
        x = x + 1
    Next hh
    z = z + 1
Next jj
Set IE = Nothing
End Sub

Im happy to help if my question is not clear.

Thanks for the support.

Upvotes: 0

Views: 134

Answers (1)

QHarr
QHarr

Reputation: 84475

The next page is retrieved by incrementing the __EVENTARGUMENT of the __doPostBack e.g. from 1 to 2, 2 to 3 etc, and then triggering the __doPostBack with the new value. The last page will have been reached when the final td node (in the pagination area) no longer has a child href containing the __EVENTTARGET (sb$grd). Using this logic you can loop, incrementing, and have an exit condition, as shown below.

For more info about this function with ASP.NET see my answer here.

Public Sub LoopPages()

    Dim ie As SHDocVw.InternetExplorer

    Set ie = New SHDocVw.InternetExplorer

    With ie

        .Visible = True
        .Navigate2 "https://www.mfa.gov.tr/sub.ar.mfa?dcabec54-44b3-4aaa-a725-70d0caa8a0ae"
        
        While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
        
        Dim i As Long

        i = 1

        Do
        
            Debug.Print i
            Debug.Print .document.querySelector(".sub_lstitm").innerText
        
            If .document.querySelectorAll("tr:nth-child(1) td:last-child [href*='sb$grd']").length = 0 Then Exit Do
        
            .document.parentWindow.execScript "__doPostBack('sb$grd','Page$" & i + 1 & "');"
        
            While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
        
            'do something with new page
        
            i = i + 1
        
        Loop
  
        Stop                                     'stops at 185
        .Quit
    End With

End Sub

Upvotes: 1

Related Questions