Reputation:
What I'm trying to do is simple, go to https://www.reddit.com/new/, and extract only the title of the first 3 posts. I've tried extracting the title of only the first one before proceeding to the next 2, but I keep running into problems. Would appreciate any help I could get.
import urllib
from bs4 import BeautifulSoup
import requests
quote_page = 'https://www.reddit.com/r/new/'
page = urllib.urlopen(quote_page)
soup = BeautifulSoup(requests.get(quote_page).text, 'html.parser')
title_box = soup.find('div', {'class':'top-matter'})
title = title_box.text.strip()
print(title)
Error output:
Traceback (most recent call last):
File "/home/ad044/Desktop/sidebar stuff/123.py", line 13, in <module>
title = title_box.text.strip()
AttributeError: 'NoneType' object has no attribute 'text'
[Finished in 1.8s with exit code 1]
[shell_cmd: python -u "/home/ad044/Desktop/sidebar stuff/123.py"]
[dir: /home/ad044/Desktop/sidebar stuff]
[path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin]
Upvotes: 1
Views: 42
Reputation: 84465
Page uses javascript so you need a method like selenium which allows rendering for your elements of interest. You can then index into list
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www.reddit.com/new/'
driver = webdriver.Chrome()
driver.get(url)
data = WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".kCqBrs")))[:3]
for item in data:
print(item.text)
Upvotes: 1