7777777777 die
7777777777 die

Reputation: 51

BeautifulSoup web scraping, no results

I'm trying to scrape news information from https://hk.appledaily.com/search/apple. And I need to get the news content from div class="flex-feature" but it only return []. Hope anyone could help, thank you!

from bs4 import BeautifulSoup
import requests


page = requests.get("https://hk.appledaily.com/search/apple")

soup = BeautifulSoup(page.content, 'lxml')

results = soup.find_all('div', class_ = "flex-feature")


print(results)

Upvotes: 2

Views: 289

Answers (2)

Tibebes. M
Tibebes. M

Reputation: 7558

The data on that page is fetched and rendered dynamically (via js). So you wouldn't be able to fetch the data unless you evaluate the javascript.

One approach to scrape the data would be to use a headless browser.
Here is one such example using pyppeteer.

import asyncio
from pyppeteer import launch

# https://pypi.org/project/pyppeteer/

URL = 'https://hk.appledaily.com/search/apple'

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto(URL)

    await page.waitForSelector(".flex-feature")

    elements = await page.querySelectorAll('.flex-feature')
    
    for el in elements:
        text = await page.evaluate('(el) => el.textContent', el)
        print(text)


    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

output:

3小時前特朗普確診 不斷更新 特朗普新聞秘書及多名白宮職員確診 「白宮群組」持續擴大特朗普確診 不斷更新

 ... REDUCTED ...

Upvotes: 1

Chris Greening
Chris Greening

Reputation: 550

If you View page source in your browser, you'll see that flex-feature is nowhere in the HTML. This is the HTML that the server initially sends back before rendering JavaScript and all the dynamic content. This is also the same HTML that requests.get is going to give you ([]).

To access these elements, you'll likely want to use something such as Selenium that will allow you to automate a browser and render the JavaScript that is dynamically loading the page. Check out my answer to a similar question here for some insight!

Additional resources:

Upvotes: 1

Related Questions