epsilon
epsilon

Reputation: 15

BeautifulSoup: how to extract specific element from parsed html

I have the following code:

import requests
from bs4 import BeautifulSoup

URL = 'https://fisheries.msc.org/en/fisheries/aafa-and-wfoa-north-pacific-albacore-tuna/@@view'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

results = soup.find("div", attrs={"class":'slab fishery-specs'})
print(results.prettify())

it outputs a block of html but I'm just looking to extract "7738 (2018)" which is right under "Tonnage" under the last "div class="fishery-spec"". Anyone know how I can extract just that?

Upvotes: 0

Views: 63

Answers (3)

import requests
from bs4 import BeautifulSoup


def main(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'lxml')
    print(soup.find(text='Tonnage').find_next('p').text)


main('https://fisheries.msc.org/en/fisheries/aafa-and-wfoa-north-pacific-albacore-tuna/@@view')

Output:

7738 (2018)

Upvotes: 0

norie
norie

Reputation: 9867

You could use select to get all the div elemebts with the class 'fishery-specs' that contain the text 'Tonnage' then return the tonnage from that.

import requests
from bs4 import BeautifulSoup

URL = 'https://fisheries.msc.org/en/fisheries/aafa-and-wfoa-north-pacific-albacore-tuna/@@view'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

results = soup.select('div.fishery-spec:contains("Tonnage") p')

results = soup.select('div.fishery-spec:contains("Tonnage") p')

for txt in results:
  print(txt.text)

Upvotes: 0

Epsi95
Epsi95

Reputation: 9047

You need to iterate over all the divs and check for the class fishery-spec, then you can extract the data where the h5 is Tonnage

import requests
from bs4 import BeautifulSoup

URL = 'https://fisheries.msc.org/en/fisheries/aafa-and-wfoa-north-pacific-albacore-tuna/@@view'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

results = soup.find_all("div", attrs={'class': 'fishery-spec'})

output = None

for each_result in results:
    if(each_result.find('h5').text == 'Tonnage'):
        output = each_result.find('p').text
        break

print(output)
7738 (2018)

Upvotes: 1

Related Questions