Reputation: 111
I am trying to scrape the website https://investing.com/ to get technical data for any stocks. I would like to get for "Moving Averages:" & "Technical Indicators:" how many buys and how many sells with different periods :
Here is an image to see data I want to get : https://i.ibb.co/mHpM0Yw/Capture-d-e-cran-2019-08-14-a-00-15-45.png
the url is https://investing.com/equities/credit-agricole-technical
When you navigate to the browser, the period is set to "hourly" and you have to click an another period to get the correct data. The DOM is update after an XML request.
I would like to scrape the page after DOM updated.
I have try to scrape with Mechanize and click on "weekly" and get the DOM to scrape it but i got an error
here is my code :
def mechanize_scraper(url)
agent = Mechanize.new
puts agent.user_agent_alias = 'Mac Safari'
page = agent.get(url)
link = page.link_with(text: 'Weekly')
new_page = link.click
end
url = "https://investing.com/equities/credit-agricole-technical"
mechanize_scraper(url)
here is the error :
Mechanize::UnsupportedSchemeError (Mechanize::UnsupportedSchemeError)
When we inspect the DOM, the link has an its attributes "href" = javascript(void);
<li pairid="407" data-period="week" class="">
<a href="javascript:void(0);">Weekly</a>
</li>
So after some tries and lots of google search, I move on "Watir" to try to scrape.
here is my code :
def watir_scraper(url)
Watir.default_timeout = 10
browser = Watir::Browser.new
browser.goto(url)
link = browser.link(text: /weekly/).click
pp link
end
url = "https://investing.com/equities/credit-agricole-technical"
watir_scraper(url)
here is the error :
40: from app.rb:47:in `'
39: from app.rb:32:in `watir_scraper'
38: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/watir-6.16.5/lib/watir/elements/element.rb:145:in `click'
37: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/watir-6.16.5/lib/watir/elements/element.rb:789:in `element_call'
36: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/watir-6.16.5/lib/watir/elements/element.rb:154:in `block in click'
35: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/common/element.rb:74:in `click'
34: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/w3c/bridge.rb:371:in `click_element'
33: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/w3c/bridge.rb:567:in `execute'
32: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/bridge.rb:167:in `execute'
31: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/http/common.rb:64:in `call'
30: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/http/default.rb:114:in `request'
29: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/http/common.rb:88:in `create_response'
28: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/http/common.rb:88:in `new'
27: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/response.rb:34:in `initialize'
26: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/response.rb:72:in `assert_ok'
25: from 25 libsystem_pthread.dylib 0x00007fff5aaa440d thread_start + 13
24: from 24 libsystem_pthread.dylib 0x00007fff5aaa8249 _pthread_start + 66
23: from 23 libsystem_pthread.dylib 0x00007fff5aaa52eb _pthread_body + 126
22: from 22 chromedriver 0x000000010b434e67 chromedriver + 3673703
21: from 21 chromedriver 0x000000010b416014 chromedriver + 3547156
20: from 20 chromedriver 0x000000010b3e0f07 chromedriver + 3329799
19: from 19 chromedriver 0x000000010b3f91b8 chromedriver + 3428792
18: from 18 chromedriver 0x000000010b3cd069 chromedriver + 3248233
17: from 17 chromedriver 0x000000010b3f86d8 chromedriver + 3426008
16: from 16 chromedriver 0x000000010b3f8940 chromedriver + 3426624
15: from 15 chromedriver 0x000000010b3ecc1f chromedriver + 3378207
14: from 14 chromedriver 0x000000010b0ce8a5 chromedriver + 108709
13: from 13 chromedriver 0x000000010b0cd7e2 chromedriver + 104418
12: from 12 chromedriver 0x000000010b0f1bf3 chromedriver + 252915
11: from 11 chromedriver 0x000000010b0fba37 chromedriver + 293431
10: from 10 chromedriver 0x000000010b0f1c4e chromedriver + 253006
9: from 9 chromedriver 0x000000010b0cfa66 chromedriver + 113254
8: from 8 chromedriver 0x000000010b0f1a72 chromedriver + 252530
7: from 7 chromedriver 0x000000010b0cfe66 chromedriver + 114278
6: from 6 chromedriver 0x000000010b0d63fb chromedriver + 140283
5: from 5 chromedriver 0x000000010b0d71a9 chromedriver + 143785
4: from 4 chromedriver 0x000000010b0d8d19 chromedriver + 150809
3: from 3 chromedriver 0x000000010b0da569 chromedriver + 157033
2: from 2 chromedriver 0x000000010b15fcef chromedriver + 703727
1: from 1 chromedriver 0x000000010b3bf133 chromedriver + 3191091 0x000000010b42f129 chromedriver + 3649833: element click intercepted: Element ... is not clickable at point (544, 704). Other element would receive the click: ... (Selenium::WebDriver::Error::ElementClickInterceptedError) (Session info: chrome=76.0.3809.100)
I hope everything can help you to understand my issue. I would like to know if I can scrape datas with Mechanize or Watir. If not, which tools can do the job ?
Thanks a lot !
Upvotes: 0
Views: 3452
Reputation: 6660
The error you are seeing in Watir is coming from webdriver and indicates that if a human tried to click that link, some other element on the page would get clicked instead (because that other element overlaps the link.
Likely the default browser size is small and you are dealing with a 'reactive' design that doesn't scale down well below a given size (common issue)
Try setting the screen size first, to be similar to what you would be using (e.g. 1024x768 or larger) @browser.window.resize_to(1920, 1080)
Upvotes: 0
Reputation: 84465
You can do this just with requests and bs4 using POST. Same idea as in other answer I see but use a loop to provide for all values requested. I simply used dev tools to monitor web traffic when clicking 5hr, Daily etc then observed the xhr calls.
import requests
from bs4 import BeautifulSoup as bs
headers = { 'User-Agent': 'Mozilla/5.0',
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': 'https://www.investing.com/equities/credit-agricole-technical',
'X-Requested-With': 'XMLHttpRequest'}
body = {'pairID' : 407, 'period': '', 'viewType' : 'normal'}
periods = {'5hr': 18000, 'Daily': 86400, 'Weekly': 'week'}
with requests.Session() as s:
for k, v in periods.items():
body['period'] = v
r = s.post('https://www.investing.com/instruments/Service/GetTechincalData', data = body, headers = headers)
soup = bs(r.content, 'lxml')
for i in soup.select('#techStudiesInnerWrap .summaryTableLine'):
print(k, ' : ' , ' '.join([j.text for j in i.select('span')]))
Output:
Upvotes: 1
Reputation: 501
I don't think it's exactly what you're looking for, but it may get you a little closer.
Using an HTTP sniffer, found that the link you're trying to click makes a POST. The response of that POST can be obtained with:
def mechanize_poster(url)
agent = Mechanize.new
headers = {
'X-Requested-With' => 'XMLHttpRequest',
'User-Agent' => 'Mac Safari',
'Content-Type' => 'application/x-www-form-urlencoded',
'Referer' => 'https://www.investing.com/equities/credit-agricole-technical'
}
fields = {
period: 'week',
viewType: 'normal',
pairID: '407'
}
page = agent.post(url, fields, headers)
p page
end
I think you'll need to use some Nokogiri to get at the data values.
Upvotes: 1