user3000538
user3000538

Reputation: 203

Python HTML web scraping

I'm attempting to write a python program to parse the following page and extract the card sub-brand and brand given the card bin#: https://www.cardbinlist.com/search.html?bin=371793. The following code snippet retrieves the card type.

page = requests.get('https://www.cardbinlist.com/search.html?bin=371793')
tree = html.fromstring(page.content)
print("card type: ", tree.xpath("//td//following::td[7]")[0].text)

However, not sure how to get the brand using similar logic as given

<th>Brand (Financial Service)</th> 
<td><a href="/AMEX-bin-list.html" target="_blank">AMEX</a></td>

then

tree.xpath("//td//following::td[5]")[0].text

returns none.

Upvotes: 0

Views: 147

Answers (1)

SanthoshSolomon
SanthoshSolomon

Reputation: 1402

I would suggest you to go for BeautifulSoup, as the CSS selectors are more convenient than xpaths.

By using beautiful soup, the code for your problem will be,

import requests
from bs4 import BeautifulSoup    

page = requests.get('https://www.cardbinlist.com/search.html?bin=371793')
soup = BeautifulSoup(page.content, 'html.parser')
brand_parent = soup.find('th', string='Brand (Financial Service)') # selects <th> element which contains text 'Brand (Financial Service)'
brand = brand_parent.find_next_sibling('td').text # O/P AMEX

If you want to go with Xpath,

change the xpath to //td//following::td[5]/a and try.

Read the following answers to choose your method of scraping,

Xpath vs DOM vs BeautifulSoup vs lxml vs other Which is the fastest approach to parse a webpage?

Parsing HTML in python - lxml or BeautifulSoup? Which of these is better for what kinds of purposes?

Upvotes: 2

Related Questions