GonzaloXavier
GonzaloXavier

Reputation: 158

Extracting contents of <option> tags in R

I'm trying to extract the text inside these <option> blocks.

What I've tried is look behinds and look aheads.

(?s)(?<=option value=\"\d).*?(?=<\/option)
(?s)(?<=option value=\"[0-9]).*?(?=<\/option)

However the value numbers change and I can't figure out how to capture multiple numbers inside a lookbehind.

Example:

<option value="140">First string I wanna get</option> <option value="6070">Another string I want</option> <option value="20">This is interesting</option>

Upvotes: 0

Views: 48

Answers (1)

Federico Piazza
Federico Piazza

Reputation: 31035

I would use xpath with an xpression like /option or //option depending on what you need.

However, if you want to use a regex, then you can use a regex with capturing group like this:

<option.*?>(.*?)</option>
or
<option[^>]+>(.*?)</option>

Working demo

Upvotes: 1

Related Questions