find specific text on a webpage using BeautifulSoup?

I need to get the last page number of the manga from this webpage, the dropdown list on this page has a string 'Last Page(57)'. I want to find the last page number using Beautiful Soup.

import bs4 as bs
import requests

ref = requests.get('https://readms.net/r/onepunch_man/083/4685/3')
soup = bs.BeautifulSoup(ref.text, 'lxml')

#FIND OUT THE LAST PAGE NUMBER FROM THE SOURCE CODE!!!

print(soup.find_all(string='Last Page')

Upvotes: 0

Answers (3)

QHarr

Reputation: 84465

With bs4 4.7.1 you can use :contains to get the a tag with Last Page in innerText

import requests
from bs4 import BeautifulSoup

r  = requests.get('https://readms.net/r/onepunch_man/083/4685/3')
soup = BeautifulSoup(r.content, 'lxml')
last_page = int(soup.select_one('a:contains("Last Page")')['href'].split('/')[-1])

Less robust:

You could positional match with

.btn-reader-page li:last-child a

Upvotes: 0

Ankit Agrawal

Reputation: 654

Use this code:

res = soup.find_all("ul",{"class":"dropdown-menu"})[-1].find_all("li")[-1].text
print(res)

output:

'Last Page (57)'

to find the number use:

import re
last_page_number = re.findall("\d+",res)
print(last_page_number)

output:

Upvotes: 1

Alderven

Reputation: 8270

You don't need to use BeautifulSoup. Simply check page source for Last Page item:

import re
import requests

r = requests.get('https://readms.net/r/onepunch_man/083/4685/3').text
last_page = re.findall('Last Page \((\d+)\)', r)[0]

Output:

Upvotes: 0

find specific text on a webpage using BeautifulSoup?

Answers (3)

Related Questions