Reputation: 50
As part of a larger webscraper built with Python, Selenium, and BeautifulSoup, I'm trying to get the text of all the tooltips on this page: https://www.legis.state.pa.us/CFDocs/Legis/BS/bs_action.cfm?SessId=20190&Sponsors=S|44|0|Katie%20J.%20Muth
My current code is successfully fetching all the links and mousing over each link--when I run it, I see each tooltip pop up in succession. However, it's only outputting the text of the very first tooltip. I have no idea why! I thought I might just need a longer wait time between mouse overs but went up as high as 20 seconds and it didn't solve the issue.
Here's the code:
bill_links = soup.find_all('a', {'id': re.compile('Bill')})
summaries = []
bill_numbers = [link.text.strip() for link in bill_links]
for link in bill_links:
billid = link.get('id')
action = ActionChains(driver)
action.move_to_element(driver.find_element_by_id(billid)).perform()
time.sleep(5)
summary = driver.find_element_by_class_name("ToolTip-BillSummary-ShortTitle").text
print(summary)
summaries = summaries + [summary]
action.reset_actions()
Again, the first print(summary) command is successfully returning the text of the first tooltip ("An Act amending the act of January 17, 1968...") -- but each subsequent print(summary) command just returns a blank.
I'm very new to programming, so apologies if there's an obvious answer.
Upvotes: 2
Views: 219
Reputation: 193258
If you are using selenium you won't have to use BeautifulSoup. To extract the text of all the tooltips on the page https://www.legis.state.pa.us/CFDocs/Legis/BS/bs_action.cfm?SessId=20190&Sponsors=S|44|0|Katie%20J.%20Muth
you can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("start-maximized")
chrome_options.add_argument('disable-infobars')
driver = webdriver.Chrome(options=chrome_options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://www.legis.state.pa.us/CFDocs/Legis/BS/bs_action.cfm?SessId=20190&Sponsors=S|44|0|Katie%20J.%20Muth")
for elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='DataTable']/tbody//tr/td/a"))):
senete_bill_shorten_number = elem.get_attribute("innerHTML").split()[1]
ActionChains(driver).move_to_element(elem).perform()
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='ToolTip-BillSummary']/div[@class='ToolTip-BillSummary-Title' and contains(., '" + senete_bill_shorten_number + "')]//following::div[2]"))).get_attribute("innerHTML"))
Console Output:
An Act amending the act of January 17, 1968 (P.L.11, No.5), known as The Minimum Wage Act of 1968, further providing for definitions and for minimum wages; providing for gratuities; further providing for enforcement and rules and regulations, for pe ...
An Act providing for mandatory Statewide employer-paid sick leave for employees and for civil penalties and remedies.
An Act amending Title 42 (Judiciary and Judicial Procedure) of the Pennsylvania Consolidated Statutes, in judicial boards and commissions, providing for adoption of guidelines for administrative probation violations; and, in sentencing, further provi ...
An Act amending the act of May 22, 1951 (P.L.317, No.69), known as The Professional Nursing Law, further providing for title, for definitions, for State Board of Nursing, for dietitian-nutritionist license required, for unauthorized practices and ac ...
An Act amending the act of March 4, 1971 (P.L.6, No.2), known as the Tax Reform Code of 1971, providing for Pennsylvania Housing Tax Credit.
An Act amending the act of December 3, 1959 (P.L.1688, No.621), known as the Housing Finance Agency Law, in Pennsylvania Housing Affordability and Rehabilitation Enhancement Program, further providing for fund.
An Act amending the act of March 10, 1949 (P.L.30, No.14), known as the Public School Code of 1949, in charter schools, further providing for funding for charter schools.
An Act amending the act of June 13, 1967 (P.L.31, No.21), known as the Human Services Code, in departmental powers and duties as to supervision, providing for lead testing in children's institutions; and, in departmental powers and duties as to lice ...
An Act providing for the protection of water supplies.
An Act amending Title 35 (Health and Safety) of the Pennsylvania Consolidated Statutes, providing for emergency addiction treatment; and imposing powers and duties on the Department of Drug and Alcohol Programs.
An Act amending Title 18 (Crimes and Offenses) of the Pennsylvania Consolidated Statutes, providing for transfer and sale of animals.
An Act amending Title 42 (Judiciary and Judicial Procedure) of the Pennsylvania Consolidated Statutes, in particular rights and immunities, providing for civil immunity of person rescuing minor from motor vehicle.
An Act providing for health care insurance coverage protections, for duties of the Insurance Department and the Insurance Commissioner, for regulations, for enforcement and for penalties.
An Act amending the act of May 17, 1921 (P.L.682, No.284), known as The Insurance Company Law of 1921, in casualty insurance, providing coverage for essential health benefits.
An Act amending the act of October 27, 1955 (P.L.744, No.222), known as the Pennsylvania Human Relations Act, further providing for definitions and for unlawful discriminatory practices.
An Act amending Titles 18 (Crimes and Offenses) and 42 (Judiciary and Judicial Procedure) of the Pennsylvania Consolidated Statutes, in human trafficking, further providing for the offense of trafficking in individuals and for the offense of patroniz ...
An Act amending Title 75 (Vehicles) of the Pennsylvania Consolidated Statutes, in registration of vehicles, further providing for veteran plates and placard.
An Act providing for health insurance coverage requirements for stage four, advanced metastatic cancer.
An Act authorizing the Commonwealth of Pennsylvania to join the Psychology Interjurisdictional Compact; providing for the form of the compact; imposing additional powers and duties on the Governor, the Secretary of the Commonwealth and the Compact.
An Act amending Titles 42 (Judiciary and Judicial Procedure) and 75 (Vehicles) of the Pennsylvania Consolidated Statutes, in sentencing, further providing for payment of court costs, restitution and fines, for fine and for failure to pay fine; in lic ...
An Act amending the act of January 17, 1968 (P.L.11, No.5), known as The Minimum Wage Act of 1968, further providing for definitions and for rate of minimum wages; and providing for reporting by the Department of Labor and Industry.
An Act amending Title 23 (Domestic Relations) of the Pennsylvania Consolidated Statutes, in marriage license, further providing for restrictions on issuance of license.
An Act amending the act of March 4, 1971 (P.L.6, No.2), known as the Tax Reform Code of 1971, in sales and use tax, further providing for exclusions from tax.
Upvotes: 1
Reputation: 84465
tl;dr:
Selenium isn't needed. If it is literally the tooltip as shown (not the full text) you can use bs4 and replicate the javascript function the page uses. The parameters for the function call are found in the script tag adjacent to the a tag for each bill listings. I regex these out from appropriate string to pass to our user defined function (which replicates jquery function)
You can see the associated call AddBillSummaryTooltip('#Bill_1',2019,0,'S','B','0012');
Tooltips:
import requests
from bs4 import BeautifulSoup as bs
import re
def add_bill_summary_tooltip(s, session_year, session_ind, bill_body, bill_type, bill_no):
url = g_server_url + '/cfdocs/cfc/GenAsm.cfc?returnformat=plain'
data = { 'method' : 'GetBillSummaryTooltip',
'SessionYear' : session_year,
'SessionInd' : session_ind,
'BillBody' : bill_body,
'BillType' : bill_type,
'BillNo' : bill_no,
'IsAjaxRequest' : '1'
}
r = s.get(url, params = data)
soup = bs(r.content, 'lxml')
tooltip = soup.select_one('.ToolTip-BillSummary-ShortTitle')
if tooltip is not None:
tooltip = tooltip.text.strip()
return tooltip
g_server_url = "https://www.legis.state.pa.us"
#add_bill_summary_tooltip('#Bill_1',2019,0,'S','B','0012')
with requests.Session() as s:
r = s.get('https://www.legis.state.pa.us/CFDocs/Legis/BS/bs_action.cfm?SessId=20190&Sponsors=S|44|0|Katie%20J.%20Muth')
soup = bs(r.content, 'lxml')
tooltips = {item.select_one('a').text:item.select_one('script').text[:-1] for item in soup.select('.DataTable td:has(a)')}
p = re.compile(r"'(.*?)',(.*),(.*),'(.*)','(.*)','(.*)'")
for bill in tooltips:
arg1,arg2,arg3,arg4,arg5,arg6 = p.findall(tooltips[bill])[0]
tooltips[bill] = add_bill_summary_tooltip(s, arg2, arg3,arg4,arg5,arg6)
print(tooltips)
Full text:
If you want full text then you can grab links to full text pages from first page then visit each page in a loop and grab full text:
import requests
from bs4 import BeautifulSoup as bs
def add_bill_summary_full(s, url):
r = s.get(url)
soup = bs(r.content, 'lxml')
summary = soup.select_one('.BillInfo-Section-Data div')
if summary is not None:
summary = summary.text
return summary
g_server_url = "https://www.legis.state.pa.us"
with requests.Session() as s:
r = s.get('https://www.legis.state.pa.us/CFDocs/Legis/BS/bs_action.cfm?SessId=20190&Sponsors=S|44|0|Katie%20J.%20Muth')
soup = bs(r.content, 'lxml')
full_text = {item.text:g_server_url + item['href'] for item in soup.select('.DataTable a')}
for k,v in full_text.items():
full_text[k] = add_bill_summary_full(s, v)
print(full_text)
This is the source code javascript function used by jquery
function AddBillSummaryTooltip(element,SessionYear,SessionInd,BillBody,BillType,BillNo) {
jQuery(element).qtip({
content: {
text: function(event, api) {
jQuery.ajax({
url: g_ServerURL + '/cfdocs/cfc/GenAsm.cfc?returnformat=plain',
data: {
method: 'GetBillSummaryTooltip',
SessionYear: SessionYear,
SessionInd: SessionInd,
BillBody: BillBody,
BillType: BillType,
BillNo: BillNo,
IsAjaxRequest: 1
}
})
Regex:
Try it here.
Explanation:
Upvotes: 3
Reputation: 13
The problem might due to this line of your code:
summary = driver.find_element_by_class_name("ToolTip-BillSummary-ShortTitle").text
your condition for finding the corresponding element is only restricted by the class name of that element, this single condition might gave you a list of elements, but you were actually not specifying which one to get the text.
To fix this, use an xpath expression instead (you need to use an index variable to locate the element):
summary = driver.find_element_by_xpath("//*[@id="qtip-" + <index> + "-content"]/div/div[3]").text
Upvotes: 0