Reputation: 8468
I am trying to scrape content of a page. Let's say this is the page: http://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL
I know I need to use Selenium
to get the data I want.
I found this example from Stackoverflow that shows how to do it:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("http://finance.yahoo.com/quote/AAPL/profile?p=AAPL")
# wait for the Full Time Employees to be visible
wait = WebDriverWait(driver, 10)
employees = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[. = 'Full Time Employees']/following-sibling::strong")))
print(employees.text)
driver.close()
My question is this: In the above example to find Full Time Employees the code that has been used is:
employees = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[. = 'Full Time Employees']/following-sibling::strong")))
How the author has found that s/he needs to use:
"//span[. = 'Full Time Employees']/following-sibling::strong"
To find the number of employees.
For my example page: http://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL how can I find for example Trailing P/E?
Upvotes: 2
Views: 3935
Reputation: 6267
Can you please tell me the steps you took to find this? I do right click and choose Inspect, but then what shall I do?
A picture is worth of thousand words.
In web dev. tools (F12) you do the following steps:
The browser makes/composes an element xpath based on its own algorithm. It might not be the way you think or the way that fits to your code. So, you have to understand xpath in nature, be acquainted with it.
See what xpath the Chrome browser has issued for Trailing P/E:
//*[@id="main-0-Quote-Proxy"]/section/div[2]/section/div/section/div[2]/div[1]/div[1]/div/table/tbody/tr[3]/td[1]/span
Upvotes: 2
Reputation: 7708
Here I have the answer for all your confusions.
It will be better to look on some xpath
tutorials and do practice from yourself, then you will be able to decide what you have to use .
There are so many site. You can start Here or Here
Now come to your Query -
Suppose I am using following xpath
to locate the element
//h3/span[text()='Financial Highlights']/../preceding-sibling::div//tr[3]/td/span
Your requirement to find Trailing P/E in your page, definatly you will look unique xpath which won't change. If you try to find this using firepath
it shows some lengthy xpath
Now you will check alternative and find another element (may be sibling, child or ancestor of your element) based on that you can to locate your element
//h3/span[text()='Financial Highlights']
h3
and I will do this using /..
/preceding-sibling::div
<div>
like -//tr[3]/td/span
See the screens as well -
Upvotes: 0
Reputation: 18799
'//h3[contains(., "Valuation Measures")]/following-sibling::div[1]//tr[3]'
Upvotes: 1