mascai
mascai

Reputation: 1872

Can't find video tag

I am trying to fetch tag url-content from the HTML code of site https://95jun.kinoxor.pro/984-univer-13-let-spustja-2024-07-06-19-54.html

The site is triky. You can open it from this page (the first/second result of the search engine https://yandex.ru/search/?text=https%3A%2F%2Fkinokubok.pro%2F232-univer-13-let-spustja-2024-06-25-19-51.html&lr=21653

I am looking for this URL: <iframe src="https://api.stiven-king.com/storage.html" ...

Proof that URL exists: enter image description here

How can I fetch html tag's content?

My code:

import seleniumwire.undetected_chromedriver as uc
import time

options = uc.ChromeOptions()
options.add_argument('--ignore-ssl-errors=yes')
options.add_argument('--ignore-certificate-errors')

driver = uc.Chrome(options=options)

def interceptor(request):
    del request.headers['Referer'] 
    request.headers['Referer'] = 'https://yandex.ru/'

url = "https://125jun.kinoamor.pro/251-univer-13-let-spustja-2024-06-27-19-51.html"

driver.request_interceptor = interceptor
driver.get(url)

time.sleep(3)
iframe_tag_elements = driver.find_elements("xpath", "//iframe")
print(f"FOUND VIDEO TAGS: {len(iframe_tag_elements)}") # prints 7
for iframe_elem in iframe_tag_elements:
    video_url = iframe_elem.get_attribute("src")
    if video_url:
        print("XXX_ ", video_url)

**PROBLEM ** - URL "https://api.stiven-king.com/storage.html" is not printed Also I don't see the URL the the driver.page_source

I was trying to sleep, to scroll page but it didn't help

Also was. trying to driver.switch_to.frame(iframe_elem) and the was serching for iframes againg

Upvotes: 1

Views: 184

Answers (3)

Guy
Guy

Reputation: 50919

As suggested in the other answers you need to switch to the <iframe> containing the link you are looking for. But instead of looking for the first <iframe> you can provide more specific locator

# replaced the url
url = "https://01jul.kinokubok.pro/232-univer-13-let-spustja-2024-07-03-20-19.html"

driver.request_interceptor = interceptor
driver.get(url)

WebDriverWait(driver, 30).until(ec.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "#dle-content .video-box > iframe:not([src])")))
iframe_tag_elements = driver.find_elements("xpath", "//iframe")
print(f"FOUND VIDEO TAGS: {len(iframe_tag_elements)}")
for iframe_elem in iframe_tag_elements:
    video_url = iframe_elem.get_attribute("src")
    if video_url:
        print("XXX_ ", video_url)

Output

FOUND VIDEO TAGS: 1
XXX_  https://api.stiven-king.com/storage.html

If you want all the <iframe>s values you can build a recursive function to extract it.

To make the page loading faster you can set page_load_strategy to 'eager', but be aware you might have to add some wait if it's too fast

Complete code

from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import seleniumwire.undetected_chromedriver as uc


options = uc.ChromeOptions()
options.add_argument('--ignore-ssl-errors=yes')
options.add_argument('--ignore-certificate-errors')
options.page_load_strategy = 'eager'

driver = uc.Chrome(options=options)


def interceptor(request):
    del request.headers['Referer']
    request.headers['Referer'] = 'https://yandex.ru/'


def get_frame_data(frames):
    src = []
    for frame in frames:
        video_url = frame.get_attribute("src")
        if video_url:
            src.append(video_url)

        driver.switch_to.frame(frame)
        child_frames = driver.find_elements("xpath", "//iframe")
        if child_frames:
            src.extend(get_frame_data(child_frames))
        driver.switch_to.default_content()

    return src


url = "https://01jul.kinokubok.pro/232-univer-13-let-spustja-2024-07-03-20-19.html"

driver.request_interceptor = interceptor
driver.get(url)

wait = WebDriverWait(driver, 10)
wait.until(ec.visibility_of_element_located(("id", "grid")))
wait.until(ec.visibility_of_element_located(("class name", "karusel")))

iframe_tag_elements = driver.find_elements("xpath", "//iframe")
all_src = get_frame_data(iframe_tag_elements)
for sr in all_src:
    print("XXX_ ", sr)

Output 1:

XXX_  https://api.marts.ws/embed/movie/74360
XXX_  https://loosening-as.allarknow.online/?token_movie=be2b9578d8cae35323bb199f888be1&token=b5c08f668c592ee23d32031d27de44
XXX_  https://www.youtube.com/embed/mthO33phh9U
XXX_  https://yastatic.net/share2/v-1.16.0/frame.html?namespace=ya-share2.0.7255632935282506
XXX_  https://yastatic.net/share2/v-1.16.0/frame.html?namespace=ya-share2.0.027842698862642123

Output 2:

XXX_  https://api.stiven-king.com/storage.html
XXX_  https://loosening-as.allarknow.online/?token_movie=be2b9578d8cae35323bb199f888be1&token=b5c08f668c592ee23d32031d27de44
XXX_  https://www.youtube.com/embed/mthO33phh9U
XXX_  https://yastatic.net/share2/v-1.16.0/frame.html?namespace=ya-share2.0.9638742292394189
XXX_  https://yastatic.net/share2/v-1.16.0/frame.html?namespace=ya-share2.0.8570699995377056

Upvotes: 1

GTK
GTK

Reputation: 1906

The <iframe> with src="https://api.stiven-king.com/storage.html" is nested within another iframe, to be able to locate it, you have to switch context to it's parent frame first:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get('https://125jun.kinoamor.pro/251-univer-13-let-spustja-2024-07-05-20-52.html')

WebDriverWait(driver, 30).until(EC.frame_to_be_available_and_switch_to_it(driver.find_element(By.CSS_SELECTOR, 'iframe')))
src = driver.find_element(By.CSS_SELECTOR, 'iframe').get_attribute('src')
print(src)

Upvotes: 1

Techrookie89
Techrookie89

Reputation: 507

The reason why you are unable to find the video tag on that page is because the content is being loaded via an iframe. My suggestion to you would be to:

  1. Switch your browser context to the iframe
  2. Search for the video tag
  3. Do your thing...
  4. Switch out of the iframe

The switching can be done using the below methods:

Switch to an iFrame

video_frames = driver.find_elements("xpath", "//iframe")
driver.switch_to.frame(video_frames)

Switch out of an iFrame

driver.switch_to.default_content()

Upvotes: 1

Related Questions