Hatshepsut
Hatshepsut

Reputation: 6662

Web scraping returns empty

My first attempt at learning scraping. I am trying to get the official names of members of the U.S. Congress.

I successfully did a POST -- response.content is indeed the full html string. But somehow lxml and bs4 aren't helping me get the name out.

Here's a short example, searching for last name "Waxman" on this site. The result I want is the person's full name, as stated in the table. I did Inspect Element > copy XPATH on the name.

from lxml import html
import requests

shortname = 'WAXMAN'
state = 'California'
chamber = 'House'

url = 'http://bioguide.congress.gov/biosearch/biosearch1.asp'
formData = {'lastname': shortname}

response = requests.post(url, data=formData)
tree = html.fromstring(response.content)
print tree.xpath('/html/body/center/table/tbody/tr[1]/td[1]/a/text()')

My attempt in beautifulSoup doesn't work either, but I'm less familiar with that package.

from bs4 import BeautifulSoup
soup = BeautifulSoup(response.content, "lxml")
soup.select('body > center > table > tbody > tr:nth-child(2) > td:nth-child(1) > a')

Upvotes: 1

Views: 875

Answers (1)

alecxe
alecxe

Reputation: 474191

You can simplify your expression to simply:

//table//td/a/text()

Results into ['WAXMAN, Henry Arnold'] being printed.

Upvotes: 1

Related Questions