Reputation: 439
I was practicing web scraping using python3 and ran into the situation that I've never met before.
What I'm trying to do is that I want to scrape a post and the corresponding replies to that post.
As an example, for each post, there can be multiple replies and if there are many replies, there can be multiple pages of replies.
Here, for the original post, there are currently 8 pages of replies.
I was looking for a class that represents the number of pages in terms of list so that I can loop over that. Below is my simplified code but found that some elements are abbreviated in the list. I thought it was going to be ['1','2','3','4','5','6','7','8']
but its structure was made as ['1','2','3','...','8']
so Python recognizes the length as only 5, not 8.
Could anyone help me how I can deal with this?
import requests
from bs4 import BeautifulSoup
import time
html = requests.get("https://community.withairbnb.com/t5/Hosting/Are-you-planning-to-spruce-up-your-space-in-2020/td-p/1165554",timeout=5)
soup = BeautifulSoup(html.content, 'html.parser')
time.sleep(1)
pages=soup.find('ul', class_="lia-paging-full-pages")
pages=pages.text.strip()
split_page=pages.split()
print (split_page)
for page_num in range(2,len(split_page)+1,1):
#some lines of my codes
print (page_num)
output:
['1', '2', '3', '…', '8']
2
3
4
5
Upvotes: 2
Views: 148
Reputation: 4676
Assuming that the last number represents the total number of pages, you can simply do
int(['1', '2', '3', '...', '8'][-1])
and then you have the length of the list. For your case:
for page_num in range(2, int(split_page[-1]) + 1, 1):
print(page_num)
2
3
4
5
6
7
8
Upvotes: 2