WebScraping with Python / Selenium

Question

I'm trying to pull some data from Youtube, but i'm struggling with catching the text, here is my code:

username = "unboxtherapy"
driver = webdriver.Chrome('C:/Users/Chrome Web Driver/chromedriver.exe')
api_url = "https://www.youtube.com/user/"+username+"/about"
driver.get(api_url)
html = driver.find_element_by_tag_name('html')
soup=bs(html.text,'html.parser')
text=str(soup)

In the example above, I'm trying to capture the description shown on the page.

soup

returns all the text on the page i.e. the description that I want + a ton of other things which I don't want.

text

returns all the following text:

"GB SIGN IN Unbox Therapy 13,802,667 subscribers JOIN SUBSCRIBE Twitter HOME VIDEOS PLAYLISTS COMMUNITY CHANNELS ABOUT Description Where products get naked. Here you will find a variety of videos showcasing the coolest products on the planet. From the newest smartphone to surprising gadgets and technology you never knew existed. It's all here on Unbox Therapy. Business / professional inquiries ONLY - business [at] unboxtherapy.com (please don't use YouTube inbox) Links Twitter Facebook Instagram The Official Website Stats Joined Dec 21, 2010 2,698,921,226 views OTHER COOL CHANNELS. Lew Later SUBSCRIBE Marques Brownlee SUBSCRIBE Jonathan Morrison SUBSCRIBE Austin Evans SUBSCRIBE DetroitBORG SUBSCRIBE LooneyTek SUBSCRIBE Soldier Knows Best SUBSCRIBE UrAvgConsumer SUBSCRIBE RELATED CHANNELS Linus Tech Tips SUBSCRIBE JerryRigEverything SUBSCRIBE Mrwhosetheboss SUBSCRIBE TechSmartt SUBSCRIBE"

Is there a way to capture just the description? is that possible at all?

Thank you in advance to whoever can help me.

Best Wishes

KunduK · Accepted Answer

Try the below code.Let me know if it work.

import bs4 as bs
import re
username = "unboxtherapy"
driver = webdriver.Chrome('C:/Users/Chrome Web Driver/chromedriver.exe')
api_url = "https://www.youtube.com/user/"+username+"/about"
driver.get(api_url)
html = driver.page_source
soup=bs.BeautifulSoup(html,'html.parser')
findtext=soup.find_all('yt-formatted-string',id=re.compile("description"))
for txt in findtext:
    print(txt.text)

Output :

Where products get naked.

Here you will find a variety of videos showcasing the coolest products on the planet. From the newest smartphone to surprising gadgets and technology you never knew existed. It's all here on Unbox Therapy.

Business / professional inquiries ONLY - business [at] unboxtherapy.com
(please don't use YouTube inbox)

WebScraping with Python / Selenium

Answers (2)

Related Questions