Stephen
Stephen

Reputation: 163

find_element_by_xpath in Python

I am searching for the part of the webpage that goes:

<TR class='title'><TD colspan=3 bgcolor=#C0C0C0>Order number 6097279</TD></TR>

I want to pull out the number - which changes every time from the text (i.e.: 'Order number 6097279' gives me the string '6097279')

I have tried the following and get an 'unable to locate element' error:

order_number = order_products.find_element_by_xpath("//TR[@class='title']");

The traceback is as follows:

in call_orderpage(https://www.daz3d.com/i/account/orderdetail?order=8104987)
Failed!
Error (NoSuchElementException): Message u'Unable to locate element: {"method":"xpath","selector":"//tr[@class=\'title\']"}'
Program finished!

Changing TR to tr does not make a difference.

Why isn't this working? I have other find_element_by_xpath searches that use the 'class = phrasing, and they work.


The code I am using:

order_number = order_products.find_element_by_xpath("//TR[@class='title']");

According to what I have read, xpath element names are not case sensitive. But I get the same error either case.

The response I get:

enter image description here

As to not "accepting" answers, I appologize for that - part is due to not knowing I had to, the other part is that I am not getting notified when there HAS been a response.

I'll go back and make the correction.

=====================================

Edit to respond to gfortune...

I updated your question with the information you provided in an answer. In the future, either reply here in a comment or edit your question directly and update it to contain the additional information. Editing your question is the best approach. I've submitted an edit for peer review so hopefully that shows up soon. Unfortunately, we're still missing a fair amount of the context for your question. A) What library are you using? B) More code. One line probably isn't enough. C) More information on the page you are parsing. Ideally, a very short test case that triggers the error. – gfortune

I'll be honest, the small print in the "comment" panel is hard to read. Plus I am not getting notified that a response has been made...

Anyway, I am not using lxml - because I hadn't understood what it was. Now that I have a better idea I'll look into how it works, thanks.

The "order_products variable" is just a sub_block of the entire html - it's the part that holds the hrtml I need to work with. The page itself has a lot of sub-lists of links and stuff I have no need to use. I separated that out since to have less worry about when searching for the data I do need.

And I thought that I had posted a correction of my code that did work - here it is again.

order_number = order_products.find_element_by_xpath("//tr[@class='title even']");

You will notice the 'title even' in place of the 'title' class...firepath showed me that hidden bit that was confusing both me and the xpath search...

My code to work with this then became:

    order_number = order_products.find_element_by_xpath("//tr[@class='title even']");
order_number = order_number.text
order_number = order_number.replace('Order number ', '')
print '\nOrder number [' + order_number + ']'

which separates the number from "Order Number "...

Upvotes: 2

Views: 9852

Answers (1)

gfortune
gfortune

Reputation: 2619

Based on the new information, I'm going to make some blind guesses and get an answer started that we can improve on as we learn more.

First, it doesn't appear you're using lxml. I've coded up a solution that works in lxml so if you're able to switch to lxml for your parsing/xpath needs, you should be able to use this directly. If not, you might offer some info on why you aren't using lxml.

Second, the error message implies that the element doesn't exist. Are you certain that a tr with class='title' exists in the document you're reading in? Run your code against a test file that you're certain contains the html that you need. I'll provide some sample html that works.

As promised, below is an example using lxml.html to parse a sample file and extract the order number. If there are specific reasons this won't work, please post the relevant information in a comment, and I'll adjust the example for you. If you simply can't switch to lxml, we'll need a bunch more info as requested in the comments. Please edit your original question (a little edit button below it) as needed.

test.py

import lxml.html

data = lxml.html.parse('test.html')

orders = data.xpath('//tr[@class="title"]/td')

for order in orders:
    print('Order text: ' + order.text)
    print('Parsed order number: ' + order.text.split(' ')[-1])

test.html

<html>
<head><title>Test</title></head>
<body>
Blah blah
<div>Ignore me</div>
<div>Outer stuff
    <table border="1">
        <tr><td>bogus stuff we don't care about</td></tr>
        <tr class='title'><td color='grey'>Order Number 6097279</td></tr>
        <tr class='something_else'><td>Boring stuff</td></tr>
    </table>
</div>
</body>
</html>

Output

Order text: Order Number 6097279
Parsed order number: 6097279

Upvotes: 2

Related Questions