Reputation: 41
I'm trying to scrape from a webpage that has the format of 15 ads per page and then you click on the next page and get the next 15 ads.
For some reason, the script only scrapes from one page and never goes to another.
Here's my code of the script:
page_num = 10
curr_page = 1
i = 1
car_title, price_hrk, year_made, km_made, date_pub, temp = [], [], [], [], [], []
title = soup.find_all(class_="classified-title")
price_kn = soup.find_all(class_="price-kn")
info = soup.find_all(class_="info-wrapper")
date = soup.find_all("span", class_="date")
# while the current page is less then or equal to the page_num variable
while curr_page <= page_num:
# make a request with a current page
page = requests.get("https://www.oglasnik.hr/prodaja-automobila?page={}".format(curr_page))
# pass it to beautiful soup
soup = BeautifulSoup(page.content, "html.parser")
# while i is less then 15 elements on the single site
while i <= 15:
# check for existance
if title[i]:
# append the value
car_title.append(title[i].get_text())
else:
# append NaN
car_title.append(np.nan)
if price_kn[i]:
price_hrk.append(price_kn[i].get_text())
else:
price_hrk.append(np.nan)
if date[i]:
date_pub.append(date[i].get_text())
else:
date_pub.append(np.nan)
# dual values, so append both to a temporary list
for tag in info[i].find_all("span", class_="classified-param-value"):
for val in tag:
temp.append(val)
try:
# if length of element is less then 5
if len(temp[0]) < 5:
# it's a year, append to year_made list
year_made.append(temp[0])
km_made.append(temp[2])
except IndexError:
# if index out of bound append NaN
year_made.append(np.nan)
km_made.append(np.nan)
# reset temp
temp = []
# add 1 to i element
i += 1
# add 1 to current page
curr_page += 1
And now if I print out the length of one of the lists I get 15.
Can someone tell me what am I doing wrong or point me in the right direction?
Thanks.
Upvotes: 4
Views: 128
Reputation: 24930
You also need to reset your i
. Right before (or after)
curr_page += 1
Add:
i = 1
And it should work.
Upvotes: 2