Reputation: 15
I have the following code:
import requests
from bs4 import BeautifulSoup
URL = 'https://fisheries.msc.org/en/fisheries/aafa-and-wfoa-north-pacific-albacore-tuna/@@view'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find("div", attrs={"class":'slab fishery-specs'})
print(results.prettify())
it outputs a block of html but I'm just looking to extract "7738 (2018)" which is right under "Tonnage" under the last "div class="fishery-spec"". Anyone know how I can extract just that?
Upvotes: 0
Views: 63
Reputation: 11525
import requests
from bs4 import BeautifulSoup
def main(url):
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
print(soup.find(text='Tonnage').find_next('p').text)
main('https://fisheries.msc.org/en/fisheries/aafa-and-wfoa-north-pacific-albacore-tuna/@@view')
Output:
7738 (2018)
Upvotes: 0
Reputation: 9867
You could use select to get all the div elemebts with the class 'fishery-specs' that contain the text 'Tonnage' then return the tonnage from that.
import requests
from bs4 import BeautifulSoup
URL = 'https://fisheries.msc.org/en/fisheries/aafa-and-wfoa-north-pacific-albacore-tuna/@@view'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.select('div.fishery-spec:contains("Tonnage") p')
results = soup.select('div.fishery-spec:contains("Tonnage") p')
for txt in results:
print(txt.text)
Upvotes: 0
Reputation: 9047
You need to iterate over all the divs and check for the class
fishery-spec
, then you can extract the data where the h5
is Tonnage
import requests
from bs4 import BeautifulSoup
URL = 'https://fisheries.msc.org/en/fisheries/aafa-and-wfoa-north-pacific-albacore-tuna/@@view'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find_all("div", attrs={'class': 'fishery-spec'})
output = None
for each_result in results:
if(each_result.find('h5').text == 'Tonnage'):
output = each_result.find('p').text
break
print(output)
7738 (2018)
Upvotes: 1