Reputation: 203
I'm attempting to write a python program to parse the following page and extract the card sub-brand and brand given the card bin#: https://www.cardbinlist.com/search.html?bin=371793. The following code snippet retrieves the card type.
page = requests.get('https://www.cardbinlist.com/search.html?bin=371793')
tree = html.fromstring(page.content)
print("card type: ", tree.xpath("//td//following::td[7]")[0].text)
However, not sure how to get the brand using similar logic as given
<th>Brand (Financial Service)</th>
<td><a href="/AMEX-bin-list.html" target="_blank">AMEX</a></td>
then
tree.xpath("//td//following::td[5]")[0].text
returns none.
Upvotes: 0
Views: 147
Reputation: 1402
I would suggest you to go for BeautifulSoup, as the CSS selectors are more convenient than xpaths.
By using beautiful soup, the code for your problem will be,
import requests
from bs4 import BeautifulSoup
page = requests.get('https://www.cardbinlist.com/search.html?bin=371793')
soup = BeautifulSoup(page.content, 'html.parser')
brand_parent = soup.find('th', string='Brand (Financial Service)') # selects <th> element which contains text 'Brand (Financial Service)'
brand = brand_parent.find_next_sibling('td').text # O/P AMEX
If you want to go with Xpath,
change the xpath to //td//following::td[5]/a
and try.
Read the following answers to choose your method of scraping,
Xpath vs DOM vs BeautifulSoup vs lxml vs other Which is the fastest approach to parse a webpage?
Parsing HTML in python - lxml or BeautifulSoup? Which of these is better for what kinds of purposes?
Upvotes: 2