Reputation: 19
I need to get the last page number of the manga from this webpage, the dropdown list on this page has a string 'Last Page(57)'
. I want to find the last page number using Beautiful Soup.
import bs4 as bs
import requests
ref = requests.get('https://readms.net/r/onepunch_man/083/4685/3')
soup = bs.BeautifulSoup(ref.text, 'lxml')
#FIND OUT THE LAST PAGE NUMBER FROM THE SOURCE CODE!!!
print(soup.find_all(string='Last Page')
Upvotes: 0
Views: 837
Reputation: 84465
With bs4 4.7.1 you can use :contains to get the a
tag with Last Page
in innerText
import requests
from bs4 import BeautifulSoup
r = requests.get('https://readms.net/r/onepunch_man/083/4685/3')
soup = BeautifulSoup(r.content, 'lxml')
last_page = int(soup.select_one('a:contains("Last Page")')['href'].split('/')[-1])
Less robust:
You could positional match with
.btn-reader-page li:last-child a
Upvotes: 0
Reputation: 654
Use this code:
res = soup.find_all("ul",{"class":"dropdown-menu"})[-1].find_all("li")[-1].text
print(res)
output:
'Last Page (57)'
to find the number use:
import re
last_page_number = re.findall("\d+",res)
print(last_page_number)
output:
57
Upvotes: 1
Reputation: 8270
You don't need to use BeautifulSoup
. Simply check page source for Last Page
item:
import re
import requests
r = requests.get('https://readms.net/r/onepunch_man/083/4685/3').text
last_page = re.findall('Last Page \((\d+)\)', r)[0]
Output:
57
Upvotes: 0