change198
change198

Reputation: 2065

Dynamically find href tag

I'm trying to extract "Information Technology" as an output from my beautiful soup search. But I can't yet figure it out as the "sector" is a dynamic value for any kind of ticker in URL.

Can anyone advise me how to extract this information?

<a href="http://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&amp;sector=45">Information Technology</a>

My code:

url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'

html = requests.get(url).text    
detail_tags_sector = BeautifulSoup(html, 'lxml')
detail_tags_sector.find_all('a')

Upvotes: 1

Views: 117

Answers (3)

Ethan
Ethan

Reputation: 11

To get the text from an anchor element you need to access the .text variable on each of your anchor elements
So your code would be changed to:

url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'
contents = []

html = requests.get(url).text    
detail_tags_sector = BeautifulSoup(html, 'html.paser')
for anchor in detail_tags_sector.find_all('a'):
    contents.append(anchor.text)
print(contents)

Upvotes: 1

Jack Fleeting
Jack Fleeting

Reputation: 24930

The problem with these answers is that they collect the text of all the links on the page, and there are quite a few. If the idea is to pick out only the information technology string, all you need to do is add:

info = soup.select_one('[href*="sectors_in"]')
print(info.text)

Output:

Information Technology

Upvotes: 0

KunduK
KunduK

Reputation: 33384

You can use either of below options.

import requests
from lxml.html.soupparser import fromstring
url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'
html = requests.get(url).text
soup=fromstring(html)
findSearch = soup.xpath('//a[contains(text(), "Information Technology")]/text()')
print(findSearch[0])

Or

from bs4 import BeautifulSoup
from lxml import html
import requests
url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'

html = requests.get(url).text
detail_tags_sector = BeautifulSoup(html, 'lxml')
for link in detail_tags_sector.find_all('a'):
    print(link.text)

OR

from bs4 import BeautifulSoup    
import requests
url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'
html = requests.get(url).text
soup = BeautifulSoup(html, 'html.parser')
for link in soup.find_all('a'):
    print(link.text)

Please let me know if this helps.

Upvotes: 0

Related Questions