Pricey
Pricey

Reputation: 81

Unable to scrape many questions from a Quora webpage

I am learning BeautifulSoup and trying to scrape links of different questions that are present on this Quora page.

As I scroll down the website, questions present in the webpage keep coming up and displayed.

When I try to scrape the links to these questions using the code below, I only get,in my case, 5 links. ie - I only get links of 5 questions even though there are lot of questions on the site.

Is there any workaround to get as many links of questions present in the webpage?

from bs4 import BeautifulSoup
import requests

root = 'https://www.quora.com/topic/Graduate-Record-Examination-GRE-1'
headers = {'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.' }
r = requests.get(root,headers=headers)

soup = BeautifulSoup(r.text,'html.parser')

q = soup.find('div',{'class':'paged_list_wrapper'})
no=0
for i in q.find_all('div',{'class':'story_title_container'}):
    t=i.a['href']
    no=no+1
    print(root+t,'\n\n')

Upvotes: 1

Views: 752

Answers (2)

Daniel
Daniel

Reputation: 61

The title is grabbed from the page and printed after formatting. This is one way to do it i'm sure there are many ways to do this and this only does one question.

import requests
from bs4 import BeautifulSoup

URL = "https://www.quora.com/Which-Deep-Learning-online-course-is-better-Coursera-specialization-VS-Udacity-Nanodegree-vs-FAST-ai"

response = requests.get(URL)
soup = BeautifulSoup(response.text, 'html.parser')

# grabs the text in the title
question = soup.select_one('title').text
# removes - quora at the end
x = slice(-8) 

print(question[x])

Upvotes: 0

nandu kk
nandu kk

Reputation: 368

What you are trying to accomplish cannot be done with Requests and BeautifulSoup. You need to use Selenium. Here i give the answer using selenium and chromedriver. Download chromedriver for you chrome version and install selenium pip install -U selenium

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import csv
browser = webdriver.Chrome(executable_path='/path/to/chromedriver')
browser.get("https://www.quora.com/topic/Graduate-Record-Examination-GRE-1")
time.sleep(1)
elem = browser.find_element_by_tag_name("body")
no_of_pagedowns = 5
while no_of_pagedowns:
    elem.send_keys(Keys.PAGE_DOWN)
    time.sleep(0.2)
    no_of_pagedowns-=1
post_elems =browser.find_elements_by_xpath("//a[@class='question_link']")
for post in post_elems:
    print(post.get_attribute("href"))

If you are using windows - executable_path='/path/to/chromedriver.exe'

change this variable no_of_pagedowns = 5 to specify how many times you want to scroll down.

I got the following output

enter image description here

Upvotes: 1

Related Questions