Extracting text from 'p' that's in 'div'

Question

What I'm trying to do is simple, go to https://www.reddit.com/new/, and extract only the title of the first 3 posts. I've tried extracting the title of only the first one before proceeding to the next 2, but I keep running into problems. Would appreciate any help I could get.

import urllib
from bs4 import BeautifulSoup
import requests


quote_page = 'https://www.reddit.com/r/new/'
page = urllib.urlopen(quote_page)
soup = BeautifulSoup(requests.get(quote_page).text, 'html.parser')
title_box = soup.find('div', {'class':'top-matter'})

title = title_box.text.strip()
print(title)

Error output:

Traceback (most recent call last):
  File "/home/ad044/Desktop/sidebar stuff/123.py", line 13, in 
    title = title_box.text.strip()
AttributeError: 'NoneType' object has no attribute 'text'
[Finished in 1.8s with exit code 1]
[shell_cmd: python -u "/home/ad044/Desktop/sidebar stuff/123.py"]
[dir: /home/ad044/Desktop/sidebar stuff]
[path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin]

QHarr · Accepted Answer

Page uses javascript so you need a method like selenium which allows rendering for your elements of interest. You can then index into list

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www.reddit.com/new/'
driver = webdriver.Chrome()
driver.get(url)
data = WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".kCqBrs")))[:3]
for item in data:
    print(item.text)

Extracting text from 'p' that's in 'div'

Answers (1)

Related Questions

Extracting text from &#39;p&#39; that&#39;s in &#39;div&#39;

Answers (1)

Related Questions

Extracting text from 'p' that's in 'div'