Reputation: 25
When I run this code, I can see that the headers list was populated with the results I want, however they are surrounded in some html I don't want to keep.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
# barchart.com uses javascript, so for now I need selenium to get full html
url = 'https://www.barchart.com/stocks/quotes/qqq/constituents'
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
browser = webdriver.Chrome(options=chrome_options)
browser.get(url)
page = browser.page_source
# BeautifulSoup find table
soup = BeautifulSoup(page, 'lxml')
table = soup.find("table")
browser.quit()
# create list headers, then populate with th tagged cells
headers = []
for i in table.find_all('th'):
title = i()
headers.append(title)
So I tried:
for i in table.find_all('th'):
title = i.text()
headers.append(title)
Which returned "TypeError: 'str' object is not callable"
This seemed to work in some example documentation, but the wikipedia tables used there seemed simpler than the ones on Barchart. Any ideas?
Upvotes: 2
Views: 38
Reputation: 16494
As @MendelG pointed out, the error lies in i.text()
because text
is a property and not a function.
Alternatively you can also use get_text()
which is a function.
I would also suggest adding a strip()
to get rid of extra whitespace around the text. Or if you want to use get_text()
it has this built in:
title = i.get_text(strip=True)
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
# barchart.com uses javascript, so for now I need selenium to get full html
url = 'https://www.barchart.com/stocks/quotes/qqq/constituents'
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
browser = webdriver.Chrome(options=chrome_options)
browser.get(url)
page = browser.page_source
# BeautifulSoup find table
soup = BeautifulSoup(page, 'lxml')
table = soup.find("table")
browser.quit()
# create list headers, then populate with th tagged cells
headers = []
for i in table.find_all('th'):
title = i.text.strip()
# Or alternatively:
#title = i.get_text(strip=True)
headers.append(title)
print(headers)
This prints:
['Symbol', 'Name', '% Holding', 'Shares', 'Links']
Upvotes: 1