sa_nyc
sa_nyc

Reputation: 981

Pattern Compiler for span html tag

Hi I am working on a project in Cloud computing an amazon. Part of code where am stuck at is getting user wish list from amazon. Since there are permissions restrictions what I did was extracted the entire page source given the wish list url. To extract the itemID I used pattern compile like

Pattern p = Pattern.compile("/dp/(\\w+)/");
                    Matcher matcher = p.matcher(content);

This was easy and it now correctly lists all the products with their itemId in that wish list. I also need the price for each. According to page source the price is

<span class="a-size-base a-color-price a-text-bold">
                      $7.19
                    </span>

I need to write a pattern for this one and am all confused and stuck.I suck at Regex expressions. Could anyone help please. I saw online references for href, but I don't think that will work for me.

Thanks to dkatzel I found this tool Jsoup. I tried the online conversion at Online Jsoup Try so when I do CSS Query div I get the required output. But how do I hard code it in my java program. I have the jsoup jar.

Upvotes: 0

Views: 525

Answers (2)

Daniel B
Daniel B

Reputation: 8879

An alternative answer where Jsoup is used.

Element e = doc.select("span.a-size-base").first();

Include jsoup-1.x.x.jar in your project or when you compile, and add the following imports.

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

Upvotes: 3

Jerry
Jerry

Reputation: 71578

Wouldn't a simple expression work?

\\$\\d+(?:\\.\\d+)

\\$ matches a literal $.

\\d+ matches digits.

(?:\\.\\d+) matches potential decimals.

The whole match is what you're looking for I guess, unless you don't need the dollar symbol, then you can use either a capture group and take the first group (i.e. \\$(\\d+(?:\\.\\d+))) or a lookbehind (i.e. (?<=\\$)\\d+(?:\\.\\d+))

Upvotes: 1

Related Questions