Nick
Nick

Reputation: 47

Webscrape loop on all URLs in Column A

I'm trying to scrape the Facebook Video Titles from a list of URL's.

I've got my macro working for a single video in which the URL is built into the code. I'd like the script to instead loop through each URL in Column A and output the Video Title into Column B. Any help?

Screenshot of worksheet

Current code:

Sub ScrapeVideoTitle()    
    Dim appIE As Object
    Set appIE = CreateObject("internetexplorer.application")

    With appIE
        .navigate "https://www.facebook.com/rankertotalnerd/videos/276505496352731/"
        .Visible = True

        Do While appIE.Busy        
            DoEvents
        Loop

        'Add Video Title to Column B
        Range("B2").Value = appIE.document.getElementsByClassName("_4ik6")(0).innerText

        appIE.Quit
        Set appIE = Nothing
    End With
End Sub

Upvotes: 1

Views: 1081

Answers (2)

QHarr
QHarr

Reputation: 84465

Provided you can go VBE > Tools > References > Add a reference to Microsoft HTML Object Library you can do the following:

Read all the urls into an array. Loop the array and use xmlhttp to issue GET request to page. Read the response into an HTMLDocument variable and use css selector to extract the title and store in an array. At the end of the loop write all results out to sheet in one go.

Option Explicit
Public Sub GetTitles()
    Dim urls(), ws As Worksheet, lastRow As Long, results(), i As Long, html As HTMLDocument

    Set html = New HTMLDocument
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    With ws
        lastRow = .Cells(.rows.Count, "A").End(xlUp).Row
        urls = Application.Transpose(.Range("A2:A" & lastRow).Value)
    End With
    ReDim results(1 To UBound(urls))
    With CreateObject("MSXML2.XMLHTTP")
        For i = LBound(urls) To UBound(urls)
            If InStr(urls(i), "http") > 0 Then
                .Open "GET", urls(i), False
                .send
                html.body.innerHTML = .responseText
                results(i) = html.querySelector(".uiHeaderTitle span").innerText
            End If
        Next
    End With
    ws.Cells(2, 2).Resize(UBound(results), 1) = Application.Transpose(results)
End Sub

Matching of css selector to page:

Upvotes: 1

SamP
SamP

Reputation: 176

If you had the "276505496352731" part of the url, or indeed the whole URL in olumn A you could set a range to the top value, and then loop until the range was empty, moving it down once for each scrape.

Something like:

'Dims as before
Dim r as range

With appIE

  set r = Range("B1")  ' Assumes B1 is the top of the URL list
  do while r.value > ""

    .navigate r.value
    'Do the rest of your IE stuff
    r.offset(0,1).Value = appIE.document.getElementsByClassName("_4ik6")(0).innerText

    set r = r.offset(1)
  Loop
End With

That should help hopefully.

Upvotes: 0

Related Questions