Reputation: 3782
I have a Python code that scraps different data. For example, it scraps the Website
from this HTML code:
<a data-ix="show-popup-on-click" target="_blank" rel="nofollow" href="https://mylink.org/" class="button full w-button" style="transition: all 0.4s ease 0s;">Website</a>
It was working properly, but now it fails with the error:
NoSuchElementException: Message: {"errorMessage":"Unable to find element with link text 'Website'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"95","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:40581","User-Agent":"Python http auth"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"link text\", \"sessionId\": \"a7a441f0-0f6a-11e8-ad3a-6121f74a30f4\", \"value\": \"Website\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/a7a441f0-0f6a-11e8-ad3a-6121f74a30f4/element"}} Screenshot: available via screen
This is my code:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.set_window_size(1120, 550)
driver.get(link)
driver.implicitly_wait(10)
website = driver.find_element_by_link_text("Website").get_attribute("href")
What am I doing wrong?
UPDATE:
<div class="column-space w-col w-col-4">
<a data-ix="show-popup-on-click" target="_blank"
rel="nofollow" href="https://example.com/"
class="button full w-button"
style="transition: all 0.4s ease 0s;">Website</a>
<div class="space big"></div>
<a target="_blank" rel="nofollow"
href="https://example.com/storage/b/2/0/2/WhitepaperLive.pdf"
class="button-2 w-button">Whitepaper</a>
<div class="space big"></div>
<a class="button-2 w-condition-invisible w-button">Program</a>
<div class="space big w-condition-invisible"></div>
<div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">Token:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">UTC</div>
</div>
</div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">Price:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">1 LUC=0,05 USD</div>
</div>
</div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">Buy with:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">USD, EUR</div>
</div>
</div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">Platform:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">MyPlatform</div>
</div>
</div>
<div class="div-block-4 w-clearfix w-condition-invisible">
<div class="div-block-2">KYC:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">No</div>
</div>
</div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">KYC:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">Yes</div>
</div>
</div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">Location:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">Malta</div>
</div>
</div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">Can't join:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">USA</div>
</div>
</div>
<div class="space big"></div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">Start:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">January 25, 2018</div>
</div>
</div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">End:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">February 5, 2018</div>
</div>
</div>
<div class="space big"></div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">Start2:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">February 12, 2018</div>
</div>
</div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">End2:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">March 5, 2018</div>
</div>
</div>
<div>
<div class="div-block-33">
<div class="space big"></div>
<div>
<a target="_blank" rel="nofollow"
class="button green full w-condition-invisible w-button">JOIN WHITELIST NOW »</a>
<div class="div-block-34">
<a target="_blank" rel="nofollow" href="http://we-do-not-have-slack.com"
class="link-block-2 w-inline-block">
<img src="https://global-uploads.webflow.com/903_slack-symbol.png" alt="ICO Slack link">
</a>
<a target="_blank" rel="nofollow" href="https://twitter.com/live" class="link-block-2 w-inline-block">
<img src="https://global-uploads.webflow.com/f4000142b091_twitter%20(1).png" width="16" alt="ICO Twitter link">
</a>
<a target="_blank" rel="nofollow" href="https://t.me/live" class="link-block-2 w-inline-block">
<img src="https://global-uploads.webflow.com/790001798dfe_telegram.png" alt="ICO Telegram link">
</a>
<a target="_blank" rel="nofollow" href="http://we-do-not-have-GitHub.com" class="link-block-2 w-inline-block">
<img src="https://global-uploads.webflow.com/59cf77c1fb0edc0001b4b26a_github-logo.png" alt="ICO GitHun link">
</a>
<a target="_blank" rel="nofollow" href="https://www.facebook.com/Play2Live-504880049864038/" class="link-block-2 w-inline-block">
<img src="https://global-uploads.webflow.com/59cf77c1fb0edc0001b4b117/59d510290116ac0001964c8e_facebook.png" alt="Facebook link">
</a>
<a target="_blank" rel="nofollow" href="https://talk.org/index.php?topic=2381679.0" class="link-block-2 w-inline-block">
<img src="https://global-uploads.webflow.com/0011f8c3c_talk.jpg" alt="Talk link">
</a>
</div>
</div>
</div>
</div>
</div>
</div>
Upvotes: 1
Views: 131
Reputation: 2267
There is no problem in the code , on inspecting the Website
link from web page i can see the text as "Website" but if i use the same text to find the element by link text like below i am getting NoSuchElementException
website = driver.find_element_by_link_text("Website").get_attribute("href")
print(website)
I have tried giving 'waits' and used partial_link_text
also but no luck.
Then i tried fetching all the element of tag name "a" and print the text from those with the below code.
elements = driver.find_elements_by_tag_name("a")
for element in elements:
print(element.text)
Later i got to know its not the "Website" its "WEBSITE". But i am not sure why its behaving like this.
After changing the all characters od website to capital i am able to identify the element and fetch the href
from that.
driver.get("https://topicolist.com/ico/adhive")
website = driver.find_element_by_link_text("WEBSITE").get_attribute("href")
print(website)
Hope its solves your problem.
Upvotes: 1
Reputation: 76
This error occurs when Selenium can't find the object in the HTML DOM.
My guess is that you set up your implicit wait too late, and Selenium tries to get the Element before the page is loaded and the element present in the HTML DOM.
driver.get(link)
driver.implicitly_wait(10)
The documentation sets up the implicit wait before getting any pages:
driver = webdriver.PhantomJS()
driver.implicitly_wait(10)
driver.get(link)
This ensures that selenium waits until the page is fully loaded before it looks for the anchor tag element.
DocLink: http://selenium-python.readthedocs.io/waits.html#implicit-waits
Also if there are no elements on that page you are scraping that are loaded or created via javascript, then you don't need selenium to do simple text extraction scraping. You could just use the core library urllib.request to get the page and then scrape with beautifulSoup.
UPDATE:
As Ian in said in the comments, implicit wait positioning doesn't matter in this case.
The Problem was the Locator Strategy.
website = driver.find_element_by_link_text('Website').get_attribute('href')
In this case it couldn't find the element, which is a Link styled to a button with uppercase lettering WEBSITE. It seems to match not the link text in the HTML DOM ("Website") but the css computed style rendered text WEBSITE on the button.
Another locator strategy like css-selector or XPATH seems to me to deliver more reliable results:
driver.find_element_by_xpath("//a[contains(text(),'Website')]").get_attribute("href")
Some more information on those can be found here: Selenium Locating Elements
Upvotes: 1