Reputation: 469
I'm trying to pull some data from Youtube, but i'm struggling with catching the text, here is my code:
username = "unboxtherapy"
driver = webdriver.Chrome('C:/Users/Chrome Web Driver/chromedriver.exe')
api_url = "https://www.youtube.com/user/"+username+"/about"
driver.get(api_url)
html = driver.find_element_by_tag_name('html')
soup=bs(html.text,'html.parser')
text=str(soup)
In the example above, I'm trying to capture the description shown on the page.
soup
returns all the text on the page i.e. the description that I want + a ton of other things which I don't want.
text
returns all the following text:
"GB\nSIGN IN\nUnbox Therapy\n13,802,667 subscribers\nJOIN\nSUBSCRIBE\nTwitter\nHOME\nVIDEOS\nPLAYLISTS\nCOMMUNITY\nCHANNELS\nABOUT\nDescription\nWhere products get naked.\n\nHere you will find a variety of videos showcasing the coolest products on the planet. From the newest smartphone to surprising gadgets and technology you never knew existed. It's all here on Unbox Therapy.\n\nBusiness / professional inquiries ONLY - business [at] unboxtherapy.com\n(please don't use YouTube inbox)\nLinks\nTwitter Facebook Instagram The Official Website\nStats\nJoined Dec 21, 2010\n2,698,921,226 views\nOTHER COOL CHANNELS.\nLew Later\nSUBSCRIBE\nMarques Brownlee\nSUBSCRIBE\nJonathan Morrison\nSUBSCRIBE\nAustin Evans\nSUBSCRIBE\nDetroitBORG\nSUBSCRIBE\nLooneyTek\nSUBSCRIBE\nSoldier Knows Best\nSUBSCRIBE\nUrAvgConsumer\nSUBSCRIBE\nRELATED CHANNELS\nLinus Tech Tips\nSUBSCRIBE\nJerryRigEverything\nSUBSCRIBE\nMrwhosetheboss\nSUBSCRIBE\nTechSmartt\nSUBSCRIBE"
Is there a way to capture just the description? is that possible at all?
Thank you in advance to whoever can help me.
Best Wishes
Upvotes: 0
Views: 115
Reputation: 185
Simple parsing can be completed using only selenium.
driver.get(api_url)
description = drvier.find_element_by_id('description')
print(description.text)
(if you use chrome and know about inspect)
to know the tag name, id or attribute value:
Then you can check value like this:
Now use the driver method
driver.find_by_elemeent_by_tag_name()
driver.find_by_elements_by_tag_name()
driver.find_by_element_id()
driver.find_by_elements_id()
driver.find_by_element_class_name()
driver.find_by_elements_class_name()
Upvotes: 1
Reputation: 33384
Try the below code.Let me know if it work.
import bs4 as bs
import re
username = "unboxtherapy"
driver = webdriver.Chrome('C:/Users/Chrome Web Driver/chromedriver.exe')
api_url = "https://www.youtube.com/user/"+username+"/about"
driver.get(api_url)
html = driver.page_source
soup=bs.BeautifulSoup(html,'html.parser')
findtext=soup.find_all('yt-formatted-string',id=re.compile("description"))
for txt in findtext:
print(txt.text)
Output :
Where products get naked.
Here you will find a variety of videos showcasing the coolest products on the planet. From the newest smartphone to surprising gadgets and technology you never knew existed. It's all here on Unbox Therapy.
Business / professional inquiries ONLY - business [at] unboxtherapy.com
(please don't use YouTube inbox)
Upvotes: 1