Lucian Chauvin
Lucian Chauvin

Reputation: 82

Using LXML To Return Title Text

I am working on a school project and I am using LXML and it's .xpath function to try to get the titles for the top videos on a youtube search that you can pick. My problem is when it is iterating through the top 5 and returning the title values of the videos I can't seem how to return the actual title no matter what I do. I have tried to do /text() or /string or /title/text() since the text I am trying to get is in the title but everything I do just returns a blank list [].

Here is my python code:

from lxml import html
import requests

string = input("Enter what you want to search up on Youtube: \n")
string.replace(" ", "+")
page = requests.get('https://www.youtube.com/results?search_query=' + string)
tree = html.fromstring(page.content)
for x in range(5):
  v = tree.xpath('/html/body/ytd-app/div/ytd-page-manager/ytd-search/div[1]/ytd-two-column-search-results-renderer/div/ytd-section-list-renderer/div[2]/ytd-item-section-renderer/div[3]/ytd-video-renderer[1]/div[1]/div/div[' + str(x) + ']/div/h3/a')
  print(v)

And here is what I am getting returned:

Enter what you want to search up on Youtube:
rainbow
[]
[]
[]
[]
[]

And this is the HTML of what I am trying to pull the TITLE TEXT this from:

<a id="video-title" class="yt-simple-endpoint style-scope ytd-video-renderer" title="Hide and Seek in Rainbow Six Siege... Let's Go!!" href="/watch?v=g8MM_RS7zmw" aria-label="Hide and Seek in Rainbow Six Siege... Let's Go!! by Get_Flanked 8 hours ago 21 minutes 54,654 views">
                Hide and Seek in Rainbow Six Siege... Let's Go!!
              </a>

Upvotes: 1

Views: 84

Answers (1)

jmunsch
jmunsch

Reputation: 24119

Consider using the YouTube data api they do have a Python library.

Otherwise, if you're looking to use a scraper of some sort, you'll need one that can execute JavaScript. requests only downloads the html text file, it doesn't run JavaScript.

For example with Selenium.

import selenium.webdriver

options = selenium.webdriver.FirefoxOptions()
options.add_argument("--headless")

driver = selenium.webdriver.Firefox(firefox_options=options)

driver.get('https://www.youtube.com/results?search_query=montypython')

[x.text for x in driver.find_elements_by_xpath('//*[@id="video-title"]')]
[x.text for x in driver.find_elements_by_id('video-title')]
print(dir(driver))

# how to get html tag attributes for example href
x.get_attribute("href")

>>> [x.get_attribute('title') for x in driver.find_elements_by_id('video-title')]
['Monty Python And The Holy Grail 1975 HD', 'Monty Python and the Holy Grail', "Monty Python's - The Funniest Joke in the World (la blague qui tue)", 'Argument', 'Monty Python - The Black Knight - Tis But A Scratch', 'Monty Python- Cheese Shop', 'Monty Python: The Parrot Sketch & The Lumberjack Song movie versions HQ', 'Biggus Dickus - Monty Python, Life of Brian.', 'Monty Python - Bridge of Death', 'Life of Brian 1979 (sub indo)', 'John Cleese - How To Irritate People 1968', 'Monty Python and The Holy Grail - Black Knight HD', 'Eric Idle - "Always Look On The Bright Side Of Life" - STEREO HQ', 'Monty pythons, Mr creosote, Full version,', 'Monty Python   Ministry of Silly Walks NL', 'Monty Python - careers advice', 'Monty Python and the Holy Grail - Bunny Attack Scene (HD)', 'Monty Python Society For Putting Things On Top of Other Things', 'Monty Python - Constitutional Peasants Scene (HD)']

Upvotes: 1

Related Questions