Reputation:
I am web-scraping this link : https://www.americanexpress.com/in/credit-cards/smart-earn-credit-card/?linknav=in-amex-cardshop-allcards-learn-SmartEarnCreditCard-carousel using bs4 and python.
I am basically grabbing the key benefits from that website using the following code.
link = 'https://www.americanexpress.com/in/credit-cards/smart-earn-credit-card/?linknav=in-amex-cardshop-allcards-learn-SmartEarnCreditCard-carousel'
html = urlopen(link)
soup = BeautifulSoup(html, 'lxml')
details = []
for span in soup.select(".why-amex__subtitle span"):
details.append(f'{span.get_text(strip=True)}: {span.find_next("span").get_text(strip=True)}')
print(details)
Output
['Accelerated Earn Rate: Earn 10X Membership Rewards® Points2on your spending on Flipkart and Uber and earn 5X Membership Rewards Points2on Amazon, Swiggy, BookMyShow and more.', 'Welcome Bonus: Rs. 500 cashback as Welcome Gift on eligible spends1of Rs. 10,000 in the first 90 days of Cardmembership', 'Renewal Fee Waiver: Get a renewal fee waiver on eligible spends3of Rs.40,000 and above in the previous year of Cardmembership', 'AMERICAN EXPRESS EMI: Convert purchases into']
The last item in this list is not scraped properly, it is incomplete. Because there is a hyperlink in the middle of the text.
Below is the html code corresponding to that problem:
<div class="why-amex__col"><span class="icons why-amex__lrgIcon icon-Amex-Icons-2016-85"></span><h4 class="why-amex__subtitle"><div><span>AMERICAN EXPRESS EMI</span></div></h4><div class="why-amex__copy"><div class="description_text"><div><span>Convert purchases into </span><a href="https://www.americanexpress.com/india/membershiprewards/cardmember_offers/viewmore.html" target="_blank">EMI</a><span> at the point of sale with an interest rate as low as 12% p.a. and zero foreclosure charges</span></div></div></div></div>
I'd like to get the full description of the last item without missing out the text.
Upvotes: 0
Views: 80
Reputation: 691
Just append the innerHTML into details
and then loop through the tags to construct your text.
Something like:
texts = []
for i, detail in enumerate(details):
texts.append('')
for tag in detail.findChildren(recursive=False):
texts[i] += tag.get_text(strip=True)
Upvotes: 1