Reputation: 1876
I'm quite new to PhantomJS. I want to do page automation with PhantomJS to a web site by manipulating the HTML elements (for example: taking buttons/links by their IDs and triggering clicks) and going from one page to another and doing the same things.
What I was wondering is: This is actually a web system that is created and belongs to some company, so in the future if they decided to make a complete redesign to their system my whole work will be lost, since they will have completely new design with new HTML structure and new IDs of elements. Is that correct? Is there a way to handle this problem?
Upvotes: 1
Views: 276
Reputation: 532
I had the same problem some time ago with CasperJS/PhantomJS. At the moment I split my casperjs-tests
With this structure you have only to update the selectors in the config file (if no site functionalities change). If you like that structure just look at the page object pattern. The maintenance for this test-structure is pretty easy, even if a redesign happens.
Upvotes: 1
Reputation: 61892
Most likely Yes, you will need to rewrite your script after the redesign. Let's go through some scenarios and how you should write your scripts.
If you rely on simple CSS selectors or XPath expressions to select input elements you want to fill or buttons you want to click, then there is a good chance that you need to change those selectors afterwards. It doesn't always have to be this way, because some (most?) site use sensible name attributes to label their input elements. Think of the username and password field for logging in. Chances are, those are named "username" and "password" even on non-english speaking sites and even after a redesign:
Look for "canonical" elements.
A strictly CSS redesign probably wouldn't introduce incompatibilities with your current script, but a redesign may also include technical changes such as the move from a multi-page web application to a single-page web application. If you heavily rely on page.onLoadFinished
/casper.then()
to wait for the next page to load in a multi-page app, this won't work anymore after a redesign to a single-page app. You would have to extensively use waitFor()
/casper.waitFor()
to wait for a specific (part of) a title or a specific ("canonical") element to appear. The perfect solution would be setTimeout()
/casper.wait()
with a large enough timeout since it doesn't depend on the page at all, but that is of course not practical, because your script would idle a lot even if the page is fully loaded and all elements are there.
If you can assume that the language doesn't change during the redesign (button labels and such), you can use XPath expressions to select elements based on the text inside them. For example if the search button text doesn't change, but it may change from a simple link to an input element or button (in any direction), then you can use an XPath expression similar to this one:
"//*[(contains(text(), 'yourText') and (local-name()='a' or local-name='button')) or (local-name()='input' and contains(@*, 'yourText'))]"
You can easily select elements by XPath by using the document.evaluate()
inside of page.evaluate()
. CasperJS provides an XPath helper utility (require('casper').selectXPath
). Nearly all CasperJS functions that take a selector handle CSS selectors as well as XPath expressions.
If you're scraping tables, then you can do more work by not relying on the table structure, but rather writing some heuristic to detect a table even if it is made up of divs and spans. This is complicated to do well and probably overkill to do this in case some time in the future a redesign might happen.
It's still possible that the redesign will change the page structure such as splitting a single page into multiple ones which you really can't do anything about ahead of time.
Upvotes: 1