Reputation: 791
I have a hard time figuring out a correct path with my web scraping code.
I am trying to scrape different info from http://financials.morningstar.com/company-profile/c.action?t=AAPL. I have tried several paths, and some seem to work and some not. I am interested in CIK under Operation Details
page = requests.get('http://financials.morningstar.com/company-profile/c.action?t=AAPL')
tree=html.fromstring(page.text)
#desc = tree.xpath('//div[@class="r_title"]/span[@class="gry"]/text()') #works
#desc = tree.xpath('//div[@class="wrapper"]//div[@class="headerwrap"]//div[@class="h_Logo"]//div[@class="h_Logo_row1"]//div[@class="greeter"]/text()') #works
#desc = tree.xpath('//div[@id="OAS_TopLeft"]//script[@type="text/javascript"]/text()') #works
desc = tree.xpath('//div[@class="col2"]//div[@id="OperationDetails"]//table[@class="r_table1 r_txt2"]//tbody//tr//th[@class="row_lbl"]/text()')
I can't figure the last path. It seems like I am following the path correctly, but I get empty list.
Upvotes: 3
Views: 2338
Reputation: 473753
The problem is that Operational Details are loaded separately with an additional GET request. Simulate it in your code maintaining a web-scrapin session:
import requests
from lxml import html
with requests.Session() as session:
page = session.get('http://financials.morningstar.com/company-profile/c.action?t=AAPL')
tree = html.fromstring(page.text)
# get the operational details
response = session.get("http://financials.morningstar.com/company-profile/component.action", params={
"component": "OperationDetails",
"t": "XNAS:AAPL",
"region": "usa",
"culture": "en-US",
"cur": "",
"_": "1444848178406"
})
tree_details = html.fromstring(response.content)
print tree_details.xpath('.//th[@class="row_lbl"]//text()')
Old answer:
It's just that you should remove tbody
from the expression:
//div[@class="col2"]//div[@id="OperationDetails"]//table[@class="r_table1 r_txt2"]//tr//th[@class="row_lbl"]/text()
tbody
is an element that is inserted by the browser to define the data rows in a table.
Upvotes: 3