Python HTML web scraping

Question

I'm attempting to write a python program to parse the following page and extract the card sub-brand and brand given the card bin#: https://www.cardbinlist.com/search.html?bin=371793. The following code snippet retrieves the card type.

page = requests.get('https://www.cardbinlist.com/search.html?bin=371793')
tree = html.fromstring(page.content)
print("card type: ", tree.xpath("//td//following::td[7]")[0].text)

However, not sure how to get the brand using similar logic as given

Brand (Financial Service) 
AMEX

then

tree.xpath("//td//following::td[5]")[0].text

returns none.

SanthoshSolomon · Accepted Answer

I would suggest you to go for BeautifulSoup, as the CSS selectors are more convenient than xpaths.

By using beautiful soup, the code for your problem will be,

import requests
from bs4 import BeautifulSoup    

page = requests.get('https://www.cardbinlist.com/search.html?bin=371793')
soup = BeautifulSoup(page.content, 'html.parser')
brand_parent = soup.find('th', string='Brand (Financial Service)') # selects  element which contains text 'Brand (Financial Service)'
brand = brand_parent.find_next_sibling('td').text # O/P AMEX

If you want to go with Xpath,

change the xpath to //td//following::td[5]/a and try.

Read the following answers to choose your method of scraping,

Xpath vs DOM vs BeautifulSoup vs lxml vs other Which is the fastest approach to parse a webpage?

Parsing HTML in python - lxml or BeautifulSoup? Which of these is better for what kinds of purposes?

Python HTML web scraping

Answers (1)

Related Questions