Reputation: 1872
I am trying to fetch tag url-content from the HTML code of site https://95jun.kinoxor.pro/984-univer-13-let-spustja-2024-07-06-19-54.html
The site is triky. You can open it from this page (the first/second result of the search engine https://yandex.ru/search/?text=https%3A%2F%2Fkinokubok.pro%2F232-univer-13-let-spustja-2024-06-25-19-51.html&lr=21653
I am looking for this URL: <iframe src="https://api.stiven-king.com/storage.html" ...
How can I fetch html tag's content?
My code:
import seleniumwire.undetected_chromedriver as uc
import time
options = uc.ChromeOptions()
options.add_argument('--ignore-ssl-errors=yes')
options.add_argument('--ignore-certificate-errors')
driver = uc.Chrome(options=options)
def interceptor(request):
del request.headers['Referer']
request.headers['Referer'] = 'https://yandex.ru/'
url = "https://125jun.kinoamor.pro/251-univer-13-let-spustja-2024-06-27-19-51.html"
driver.request_interceptor = interceptor
driver.get(url)
time.sleep(3)
iframe_tag_elements = driver.find_elements("xpath", "//iframe")
print(f"FOUND VIDEO TAGS: {len(iframe_tag_elements)}") # prints 7
for iframe_elem in iframe_tag_elements:
video_url = iframe_elem.get_attribute("src")
if video_url:
print("XXX_ ", video_url)
**PROBLEM ** - URL "https://api.stiven-king.com/storage.html" is not printed
Also I don't see the URL the the driver.page_source
I was trying to sleep, to scroll page but it didn't help
Also was. trying to driver.switch_to.frame(iframe_elem)
and the was serching for iframes againg
Upvotes: 1
Views: 184
Reputation: 50919
As suggested in the other answers you need to switch to the <iframe>
containing the link you are looking for. But instead of looking for the first <iframe>
you can provide more specific locator
# replaced the url
url = "https://01jul.kinokubok.pro/232-univer-13-let-spustja-2024-07-03-20-19.html"
driver.request_interceptor = interceptor
driver.get(url)
WebDriverWait(driver, 30).until(ec.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "#dle-content .video-box > iframe:not([src])")))
iframe_tag_elements = driver.find_elements("xpath", "//iframe")
print(f"FOUND VIDEO TAGS: {len(iframe_tag_elements)}")
for iframe_elem in iframe_tag_elements:
video_url = iframe_elem.get_attribute("src")
if video_url:
print("XXX_ ", video_url)
Output
FOUND VIDEO TAGS: 1
XXX_ https://api.stiven-king.com/storage.html
If you want all the <iframe>
s values you can build a recursive function to extract it.
To make the page loading faster you can set page_load_strategy
to 'eager'
, but be aware you might have to add some wait if it's too fast
Complete code
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import seleniumwire.undetected_chromedriver as uc
options = uc.ChromeOptions()
options.add_argument('--ignore-ssl-errors=yes')
options.add_argument('--ignore-certificate-errors')
options.page_load_strategy = 'eager'
driver = uc.Chrome(options=options)
def interceptor(request):
del request.headers['Referer']
request.headers['Referer'] = 'https://yandex.ru/'
def get_frame_data(frames):
src = []
for frame in frames:
video_url = frame.get_attribute("src")
if video_url:
src.append(video_url)
driver.switch_to.frame(frame)
child_frames = driver.find_elements("xpath", "//iframe")
if child_frames:
src.extend(get_frame_data(child_frames))
driver.switch_to.default_content()
return src
url = "https://01jul.kinokubok.pro/232-univer-13-let-spustja-2024-07-03-20-19.html"
driver.request_interceptor = interceptor
driver.get(url)
wait = WebDriverWait(driver, 10)
wait.until(ec.visibility_of_element_located(("id", "grid")))
wait.until(ec.visibility_of_element_located(("class name", "karusel")))
iframe_tag_elements = driver.find_elements("xpath", "//iframe")
all_src = get_frame_data(iframe_tag_elements)
for sr in all_src:
print("XXX_ ", sr)
Output 1:
XXX_ https://api.marts.ws/embed/movie/74360
XXX_ https://loosening-as.allarknow.online/?token_movie=be2b9578d8cae35323bb199f888be1&token=b5c08f668c592ee23d32031d27de44
XXX_ https://www.youtube.com/embed/mthO33phh9U
XXX_ https://yastatic.net/share2/v-1.16.0/frame.html?namespace=ya-share2.0.7255632935282506
XXX_ https://yastatic.net/share2/v-1.16.0/frame.html?namespace=ya-share2.0.027842698862642123
Output 2:
XXX_ https://api.stiven-king.com/storage.html
XXX_ https://loosening-as.allarknow.online/?token_movie=be2b9578d8cae35323bb199f888be1&token=b5c08f668c592ee23d32031d27de44
XXX_ https://www.youtube.com/embed/mthO33phh9U
XXX_ https://yastatic.net/share2/v-1.16.0/frame.html?namespace=ya-share2.0.9638742292394189
XXX_ https://yastatic.net/share2/v-1.16.0/frame.html?namespace=ya-share2.0.8570699995377056
Upvotes: 1
Reputation: 1906
The <iframe>
with src="https://api.stiven-king.com/storage.html" is nested within another iframe, to be able to locate it, you have to switch context to it's parent frame first:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get('https://125jun.kinoamor.pro/251-univer-13-let-spustja-2024-07-05-20-52.html')
WebDriverWait(driver, 30).until(EC.frame_to_be_available_and_switch_to_it(driver.find_element(By.CSS_SELECTOR, 'iframe')))
src = driver.find_element(By.CSS_SELECTOR, 'iframe').get_attribute('src')
print(src)
Upvotes: 1
Reputation: 507
The reason why you are unable to find the video tag on that page is because the content is being loaded via an iframe. My suggestion to you would be to:
The switching can be done using the below methods:
video_frames = driver.find_elements("xpath", "//iframe")
driver.switch_to.frame(video_frames)
driver.switch_to.default_content()
Upvotes: 1