Reputation: 39
I'm new to web scraping. I want the scraper return all the paragraphs with the keyword "neuro", however when I run the code it seems to return the same output for all the iterations. Could you kindly point me out my mistake?
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re
from time import sleep
from random import randint
url = "https://www.findamasters.com/masters-degrees/united-kingdom/?40w900"
results = requests.get(url)
info =[]
page_number = np.arange(1,1219)
soup = BeautifulSoup(results.text, "html.parser")
for page in page_number:
page = requests.get("https://www.findamasters.com/masters-degrees/united-kingdom/?40w900&PG=" + str(page))
div = soup.find("p", string =re.compile('neuro'))
sleep(randint(2,10))
masters = pd.DataFrame({
'info': div})
masters.to_csv('masters.csv')
But the only output I get is:
<p>It’s our mission to prolong and improve the lives of patients, and we seek to do this by conducting world-leading research in areas such as neuroscience, oncology, infectious diseases and more.</p>
<p>It’s our mission to prolong and improve the lives of patients, and we seek to do this by conducting world-leading research in areas such as neuroscience, oncology, infectious diseases and more.</p>
....
Upvotes: 1
Views: 344
Reputation: 2614
Here is your problem. BeautifulSoup
parase results.text
and results are from the fixed url "https://www.findamasters.com/masters-degrees/united-kingdom/?40w900".
Thus change the code as follows.
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re
from time import sleep
from random import randint
url = "https://www.findamasters.com/masters-degrees/united-kingdom/?40w900"
results = requests.get(url)
info =[]
page_number = np.arange(1,1219)
soup = BeautifulSoup(results.text, "html.parser")
for page in page_number:
page = requests.get("https://www.findamasters.com/masters-degrees/united-kingdom/?40w900&PG=" + str(page))
results = requests.get(page)
soup = BeautifulSoup(results.text, "html.parser")
div = soup.find("p", string =re.compile('neuro'))
sleep(randint(2,10))
masters = pd.DataFrame({
'info': div})
masters.to_csv('masters.csv')
Upvotes: 1