Reputation: 2065
I'm trying to extract "Information Technology" as an output from my beautiful soup search. But I can't yet figure it out as the "sector" is a dynamic value for any kind of ticker in URL.
Can anyone advise me how to extract this information?
<a href="http://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=45">Information Technology</a>
My code:
url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'
html = requests.get(url).text
detail_tags_sector = BeautifulSoup(html, 'lxml')
detail_tags_sector.find_all('a')
Upvotes: 1
Views: 117
Reputation: 11
To get the text from an anchor element you need to access the .text variable on each of your anchor elements
So your code would be changed to:
url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'
contents = []
html = requests.get(url).text
detail_tags_sector = BeautifulSoup(html, 'html.paser')
for anchor in detail_tags_sector.find_all('a'):
contents.append(anchor.text)
print(contents)
Upvotes: 1
Reputation: 24930
The problem with these answers is that they collect the text of all the links on the page, and there are quite a few. If the idea is to pick out only the information technology
string, all you need to do is add:
info = soup.select_one('[href*="sectors_in"]')
print(info.text)
Output:
Information Technology
Upvotes: 0
Reputation: 33384
You can use either of below options.
import requests
from lxml.html.soupparser import fromstring
url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'
html = requests.get(url).text
soup=fromstring(html)
findSearch = soup.xpath('//a[contains(text(), "Information Technology")]/text()')
print(findSearch[0])
Or
from bs4 import BeautifulSoup
from lxml import html
import requests
url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'
html = requests.get(url).text
detail_tags_sector = BeautifulSoup(html, 'lxml')
for link in detail_tags_sector.find_all('a'):
print(link.text)
OR
from bs4 import BeautifulSoup
import requests
url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'
html = requests.get(url).text
soup = BeautifulSoup(html, 'html.parser')
for link in soup.find_all('a'):
print(link.text)
Please let me know if this helps.
Upvotes: 0