Reputation: 6662
My first attempt at learning scraping. I am trying to get the official names of members of the U.S. Congress.
I successfully did a POST -- response.content
is indeed the full html string. But somehow lxml
and bs4
aren't helping me get the name out.
Here's a short example, searching for last name "Waxman" on this site. The result I want is the person's full name, as stated in the table. I did Inspect Element > copy XPATH on the name.
from lxml import html
import requests
shortname = 'WAXMAN'
state = 'California'
chamber = 'House'
url = 'http://bioguide.congress.gov/biosearch/biosearch1.asp'
formData = {'lastname': shortname}
response = requests.post(url, data=formData)
tree = html.fromstring(response.content)
print tree.xpath('/html/body/center/table/tbody/tr[1]/td[1]/a/text()')
My attempt in beautifulSoup doesn't work either, but I'm less familiar with that package.
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.content, "lxml")
soup.select('body > center > table > tbody > tr:nth-child(2) > td:nth-child(1) > a')
Upvotes: 1
Views: 875
Reputation: 474191
You can simplify your expression to simply:
//table//td/a/text()
Results into ['WAXMAN, Henry Arnold']
being printed.
Upvotes: 1