Roman Vasiliev
Roman Vasiliev

Reputation: 39

Beautiful Soup returns the same output over and over

I'm new to web scraping. I want the scraper return all the paragraphs with the keyword "neuro", however when I run the code it seems to return the same output for all the iterations. Could you kindly point me out my mistake?

import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re 

from time import sleep
from random import randint

url = "https://www.findamasters.com/masters-degrees/united-kingdom/?40w900"
results = requests.get(url)
info =[]  
page_number = np.arange(1,1219)
soup = BeautifulSoup(results.text, "html.parser")

for page in page_number:
    page = requests.get("https://www.findamasters.com/masters-degrees/united-kingdom/?40w900&PG=" + str(page))
    div = soup.find("p", string =re.compile('neuro'))

sleep(randint(2,10))

masters = pd.DataFrame({
    'info': div})
masters.to_csv('masters.csv')

But the only output I get is:

<p>It’s our mission to prolong and improve the lives of patients, and we seek to do this by conducting world-leading research in areas such as neuroscience, oncology, infectious diseases and more.</p>
<p>It’s our mission to prolong and improve the lives of patients, and we seek to do this by conducting world-leading research in areas such as neuroscience, oncology, infectious diseases and more.</p>
....

Upvotes: 1

Views: 344

Answers (1)

Gilseung Ahn
Gilseung Ahn

Reputation: 2614

Here is your problem. BeautifulSoup parase results.text and results are from the fixed url "https://www.findamasters.com/masters-degrees/united-kingdom/?40w900".

Thus change the code as follows.

import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re 

from time import sleep
from random import randint

url = "https://www.findamasters.com/masters-degrees/united-kingdom/?40w900"
results = requests.get(url)
info =[]  
page_number = np.arange(1,1219)
soup = BeautifulSoup(results.text, "html.parser")

for page in page_number:
    page = requests.get("https://www.findamasters.com/masters-degrees/united-kingdom/?40w900&PG=" + str(page))
    results = requests.get(page)
    soup = BeautifulSoup(results.text, "html.parser")
    div = soup.find("p", string =re.compile('neuro'))

sleep(randint(2,10))

masters = pd.DataFrame({
    'info': div})
masters.to_csv('masters.csv')

Upvotes: 1

Related Questions