Reputation: 37
I'm trying to get the congress house members text attributes from the https://www.congress.gov/members website. I'm very new at this. I followed a tutorial on youtube and think I am very close.
Here is a snippet of the html info I am trying to get. Text shown in bold.
Here is my syntax that I think gets me the closest (using python 2.7 - work constraints) :
import requests, lxml
import lxml.html
#from bs4 import BeautifulSoup
html = requests.get('https://www.congress.gov/members?q=%7B%22congress%22%3A%22117%22%2C%22chamber%22%3A%22Senate%22%7D')
doc = lxml.html.fromstring(html.content)
house = doc.xpath('//div[@id="houseMemberNavigator"]')[0]
print(house)#got printed element div
members = house.xpath('.//select[@id="members-representatives"]/text()')
#returns ['\n ', ' ']
print(members)
I'm sure it's my syntax but have not been able to solve....
Upvotes: 0
Views: 72
Reputation: 14103
Using BeautifulSoup
soup = BeautifulSoup(html.text, 'lxml')
[data.text for data in soup.find(id='members-representatives').select('option[value]')]
['Find a Representative',
'Adams, Alma S. [D-NC-12]',
'Aderholt, Robert B. [R-AL-4]',
'Aguilar, Pete [D-CA-31]',
'Allen, Rick W. [R-GA-12]',
'Allred, Colin Z. [D-TX-32]',
'Amodei, Mark E. [R-NV-2]',
'Armstrong, Kelly [R-ND]',
'Arrington, Jodey C. [R-TX-19]',
'Auchincloss, Jake [D-MA-4]',
'Axne, Cynthia [D-IA-3]',
'Babin, Brian [R-TX-36]',
...]
Upvotes: 1