Reputation: 375
I am trying to scrape a web page with Capybara, which is working fine, except that I am having trouble with a certain page. It has a dropdown list defined with the following HTML:
<select onChange="this.form.submit();" id="AcctNumber" name="AcctNum">
<option value="MU:P2" selected="true">Investment - 2845</option>
<option value="MU:P0">Patrick UGMA - 1585</option>
<option value="MU:P1">Lisa UGMA - 1655</option>
I have tried to select a value with many variations on this theme
selector = 'Investment - 2845'
selector = 'Investment - 2845'
selector = 'Investment - 2845'
select selector, :from => "AcctNumber"
all of which (and many more) produce ElementNotFound
errors.
Is there a way to just use a regular expression, say /Invest/
or /Pat/
or /Lisa/
to
select the item? It sure would be easier than trying to guess what literal string will
match the mysterious whitespace around those hyphens.
Upvotes: 4
Views: 941
Reputation: 27236
Your problem is likely that there is no white-space around the hyphens. The
is a non-breaking space character when rendered in the browser - but when read by a screen-scraper, it is 6 characters. "
".
This means that when you try to match it with a screen-scraper, you should likely try to match the HTML, not the rendered version.
The second thing I noticed from your cut'n'paste was that there were tab characters around them. Tabs and spaces, in RegEx, are both matched using the \s
selector.
Try this RegEx as a start:
(Investment|Patrick|Lisa)[\s]*( )[-]( )[\s]*[0-9](,4)
This starts by matching the word "Investment" OR "Patrick" OR "Lisa" - then any amount of white-space (spaces, tabs, etc) - then the literal " ", a dash, the literal " " again, any amount of spaces again, and then 4 digits 0-9.
Note: I have not tested this RegEx. However, it should give you a good starting point to build from. I suggest Regular-Expressions.info if you need more help adjusting it.
Upvotes: 0