ddoherty
ddoherty

Reputation: 375

Capybara select from dropdown with regular expression?

I am trying to scrape a web page with Capybara, which is working fine, except that I am having trouble with a certain page. It has a dropdown list defined with the following HTML:

<select onChange="this.form.submit();" id="AcctNumber" name="AcctNum">
<option value="MU:P2" selected="true">Investment &nbsp;-&nbsp;2845</option>
<option value="MU:P0">Patrick UGMA&nbsp;-&nbsp;1585</option>
<option value="MU:P1">Lisa UGMA&nbsp;-&nbsp;1655</option>

I have tried to select a value with many variations on this theme

selector = 'Investment - 2845'
selector = 'Investment &nbsp;-&nbsp; 2845'
selector = 'Investment    &nbsp;-&nbsp;   2845'
select selector, :from => "AcctNumber"

all of which (and many more) produce ElementNotFound errors.

Is there a way to just use a regular expression, say /Invest/ or /Pat/ or /Lisa/ to select the item? It sure would be easier than trying to guess what literal string will match the mysterious whitespace around those hyphens.

Upvotes: 4

Views: 941

Answers (1)

Troy Alford
Troy Alford

Reputation: 27236

Your problem is likely that there is no white-space around the hyphens. The &nbsp; is a non-breaking space character when rendered in the browser - but when read by a screen-scraper, it is 6 characters. "&nbsp;".

This means that when you try to match it with a screen-scraper, you should likely try to match the HTML, not the rendered version.

The second thing I noticed from your cut'n'paste was that there were tab characters around them. Tabs and spaces, in RegEx, are both matched using the \s selector.

Try this RegEx as a start:

(Investment|Patrick|Lisa)[\s]*(&nbsp;)[-](&nbsp;)[\s]*[0-9](,4)

This starts by matching the word "Investment" OR "Patrick" OR "Lisa" - then any amount of white-space (spaces, tabs, etc) - then the literal " ", a dash, the literal " " again, any amount of spaces again, and then 4 digits 0-9.

Note: I have not tested this RegEx. However, it should give you a good starting point to build from. I suggest Regular-Expressions.info if you need more help adjusting it.

Upvotes: 0

Related Questions