TJ1
TJ1

Reputation: 8468

Python: finding elements of a webpage to scrape in python when page content is loaded using Java script

I am trying to scrape content of a page. Let's say this is the page: http://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL

I know I need to use Selenium to get the data I want. I found this example from Stackoverflow that shows how to do it:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

driver = webdriver.Chrome()
driver.maximize_window()
driver.get("http://finance.yahoo.com/quote/AAPL/profile?p=AAPL")

# wait for the Full Time Employees to be visible
wait = WebDriverWait(driver, 10)
employees = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[. = 'Full Time Employees']/following-sibling::strong")))
print(employees.text)

driver.close()

My question is this: In the above example to find Full Time Employees the code that has been used is:

employees = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[. = 'Full Time Employees']/following-sibling::strong")))

How the author has found that s/he needs to use:

"//span[. = 'Full Time Employees']/following-sibling::strong"

To find the number of employees.

For my example page: http://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL how can I find for example Trailing P/E?

Upvotes: 2

Views: 3935

Answers (3)

Igor Savinkin
Igor Savinkin

Reputation: 6267

Can you please tell me the steps you took to find this? I do right click and choose Inspect, but then what shall I do?

A picture is worth of thousand words.

enter image description here In web dev. tools (F12) you do the following steps:

  1. Choose Elements tab
  2. Press Element Selector button
  3. With that button pressed you click an element in the main browser window.
  4. In the DOM-elements window you right-click that highlighted element.
  5. The context menu gets transpired and you choose Copy.
  6. Choose Copy XPath in a sub menu. Now you have that element xpath in a console buffer.

NOTE!

The browser makes/composes an element xpath based on its own algorithm. It might not be the way you think or the way that fits to your code. So, you have to understand xpath in nature, be acquainted with it.

See what xpath the Chrome browser has issued for Trailing P/E:

//*[@id="main-0-Quote-Proxy"]/section/div[2]/section/div/section/div[2]/div[1]/div[1]/div/table/tbody/tr[3]/td[1]/span

Upvotes: 2

NarendraR
NarendraR

Reputation: 7708

Here I have the answer for all your confusions.

It will be better to look on some xpath tutorials and do practice from yourself, then you will be able to decide what you have to use . There are so many site. You can start Here or Here

Now come to your Query -

Suppose I am using following xpath to locate the element

//h3/span[text()='Financial Highlights']/../preceding-sibling::div//tr[3]/td/span
  1. Your requirement to find Trailing P/E in your page, definatly you will look unique xpath which won't change. If you try to find this using firepath it shows some lengthy xpath

  2. Now you will check alternative and find another element (may be sibling, child or ancestor of your element) based on that you can to locate your element

    • in My case, first will find the Financial Highlights text which I will be able to find using //h3/span[text()='Financial Highlights']
    • Now I move its parent tag which is h3 and I will do this using /..
    • I have Trailing P/E element in just above the current node so move on just above node using /preceding-sibling::div
    • And finally find your element in that <div> like -//tr[3]/td/span

See the screens as well -

Step 1 : enter image description here

Step 2 : enter image description here

Step 3 : enter image description here

Step 4 : enter image description here

Upvotes: 0

eLRuLL
eLRuLL

Reputation: 18799

'//h3[contains(., "Valuation Measures")]/following-sibling::div[1]//tr[3]'

Upvotes: 1

Related Questions