BeautifulSoup web scraping, no results

Question

I'm trying to scrape news information from https://hk.appledaily.com/search/apple. And I need to get the news content from div class="flex-feature" but it only return []. Hope anyone could help, thank you!

from bs4 import BeautifulSoup
import requests


page = requests.get("https://hk.appledaily.com/search/apple")

soup = BeautifulSoup(page.content, 'lxml')

results = soup.find_all('div', class_ = "flex-feature")


print(results)

Tibebes. M · Accepted Answer

The data on that page is fetched and rendered dynamically (via js). So you wouldn't be able to fetch the data unless you evaluate the javascript.

One approach to scrape the data would be to use a headless browser.
Here is one such example using pyppeteer.

import asyncio
from pyppeteer import launch

# https://pypi.org/project/pyppeteer/

URL = 'https://hk.appledaily.com/search/apple'

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto(URL)

    await page.waitForSelector(".flex-feature")

    elements = await page.querySelectorAll('.flex-feature')
    
    for el in elements:
        text = await page.evaluate('(el) => el.textContent', el)
        print(text)


    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

output:

3小時前特朗普確診 不斷更新 特朗普新聞秘書及多名白宮職員確診 「白宮群組」持續擴大特朗普確診 不斷更新

 ... REDUCTED ...

BeautifulSoup web scraping, no results

Answers (2)

Related Questions