meiryo
meiryo

Reputation: 11687

Trouble web scraping data from a tag's inline style attribute

So I have a couple of spans with inline styles:

<span style="...;width:8px;..."></span>
<span style="...;width:16px;..."></span>
<span style="...;width:13px;..."></span>
<span style="...;width:20px;..."></span>
<span style="...;width:0px;..."></span> //width=0px
<span style="...;width:5px;..."></span>
<span style="...;width:3px;..."></span>
<span style="...;width:90px;..."></span>
<span style="...;width:200px;..."></span>

I want to extract the "px" value and store it into an array. When we hit a span with width=0px, that signifies the end of that array. So the above will look like this:

array1 = [8, 16, 13, 20]

array2 = [5, 3, 90, 200]

We can use an arraylist of integer arrays to store the data.

What I have so far is very basic: Elements spanWidths= doc.select("span");

So far this produces: "border:...;width:8px;..."

I believe that we use regex to solve this but I'm not very accustomed to it. Any help?

Upvotes: 0

Views: 108

Answers (1)

marcus erronius
marcus erronius

Reputation: 3693

The regex would be \bwidth\s*:\s*(\d+)px. Then take the value from the first capture group. That is, call .group(1) on the resulting match.

Upvotes: 2

Related Questions