Reputation: 11
html=
"""<div class="slick-list"><div class="slick-track" style="width: 1380px; opacity: 1; transform: translate3d(0px, 0px, 0px);"><div data-index="0" class="slick-slide slick-active slick-current" tabindex="-1" aria-hidden="false" style="outline: none; width: 230px;"><div><div data-courseid="567828" class="course-discovery-unit--card-margin--2TVw4 merchandising-course-card--card--2UfMa"><a href="/course/complete-python-bootcamp/" data-purpose="merchandising-course-card-body-567828" target="_self" class="merchandising-course-card--mask--2-b-d"><div class="merchandising-course-card--card-header--89z8L"><img class="merchandising-course-card--course-image--3G7Kh" alt="" width="240" height="135" src="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg" srcset="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg 1x, https://img-a.udemycdn.com/course/480x270/567828_67d0.jpg 2x"></div><div class="merchandising-course-card--card-body--3OpAH"><div><div class="merchandising-course-card--course-title--2Ob4m" data-purpose="course-card-title">Complete Python Bootcamp: Go from zero to hero in Python 3</div>"""
I want to extract link and title output:
title=Complete Python Bootcamp: Go from zero to hero in Python 3
link=/course/complete-python-bootcamp/
Here is my code:
data=soup.findAll("div",{"class":"slick-list"})
print(data)
for link in data:
for a in link.findAll("a"):
print(a.title,a.href)
Upvotes: 0
Views: 257
Reputation: 1288
I working solution based on your code (and using findAll
):
from bs4 import BeautifulSoup
html= """<div class="slick-list"><div class="slick-track" style="width: 1380px; opacity: 1; transform: translate3d(0px, 0px, 0px);"><div data-index="0" class="slick-slide slick-active slick-current" tabindex="-1" aria-hidden="false" style="outline: none; width: 230px;"><div><div data-courseid="567828" class="course-discovery-unit--card-margin--2TVw4 merchandising-course-card--card--2UfMa"><a href="/course/complete-python-bootcamp/" data-purpose="merchandising-course-card-body-567828" target="_self" class="merchandising-course-card--mask--2-b-d"><div class="merchandising-course-card--card-header--89z8L"><img class="merchandising-course-card--course-image--3G7Kh" alt="" width="240" height="135" src="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg" srcset="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg 1x, https://img-a.udemycdn.com/course/480x270/567828_67d0.jpg 2x"></div><div class="merchandising-course-card--card-body--3OpAH"><div><div class="merchandising-course-card--course-title--2Ob4m" data-purpose="course-card-title">Complete Python Bootcamp: Go from zero to hero in Python 3</div>"""
soup = BeautifulSoup(html, 'html.parser')
data=soup.findAll("div",{"class":"slick-list"})
#print(data)
for div in data:
for a in div.findAll("a"):
print(div.text,a.get('href'))
Upvotes: 0
Reputation: 204
from bs4 import BeautifulSoup
html="""<div class="slick-list"><div class="slick-track" style="width: 1380px; opacity: 1; transform: translate3d(0px, 0px, 0px);"><div data-index="0" class="slick-slide slick-active slick-current" tabindex="-1" aria-hidden="false" style="outline: none; width: 230px;"><div><div data-courseid="567828" class="course-discovery-unit--card-margin--2TVw4 merchandising-course-card--card--2UfMa"><a href="/course/complete-python-bootcamp/" data-purpose="merchandising-course-card-body-567828" target="_self" class="merchandising-course-card--mask--2-b-d"><div class="merchandising-course-card--card-header--89z8L"><img class="merchandising-course-card--course-image--3G7Kh" alt="" width="240" height="135" src="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg" srcset="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg 1x, https://img-a.udemycdn.com/course/480x270/567828_67d0.jpg 2x"></div><div class="merchandising-course-card--card-body--3OpAH"><div><div class="merchandising-course-card--course-title--2Ob4m" data-purpose="course-card-title">Complete Python Bootcamp: Go from zero to hero in Python 3</div>"""
soup = BeautifulSoup(html, 'html.parser')
print('title='+soup.find("div",{"data-purpose":"course-card-title"}).text)
print('link='+soup.find("a").get('href'))
I hope this answers your question.
Upvotes: 2