Reputation: 41
trying to convert this example
Some Nice Article on amazon https://www.amazon.de/gp/product/ADKLHJADK/ref=as_li_ss_tl?ie=UTF8&pd_rd_i=B01J7LLL9Q&pd_rd_r=a8c7bb4b-49da-11e8-ad28-014ae5dc2f42&pd_rd_w=9QOk2&pd_rd_wg=zc1s7&pf_rd_m=A3JWKAKR8XB7XF&pf_rd_s=&pf_rd_r=VF3C7MDNZ741H8S13AYV&pf_rd_t=36701&pf_rd_p=1c175abe-9bc7-490b-bbe1-2caf7e752c98&pf_rd_i=desktop&linkCode=ll1
to this
https://www.amazon.de/gp/product/YXZ91ALI91/
what is the correct or best way to handle this in java with regex? because my option looks really dirty...
https://www.amazon.de/gp/product/[A-Za-z0-9][A-Za-z0-9][A-Za-z0-9][A-Za-z0-9][A-Za-z0-9][A-Za-z0-9][A-Za-z0-9][A-Za-z0-9][A-Za-z0-9][A-Za-z0-9]/
Working Solution for getting a amazon link out:
First Part till | is for every desktop copy and paste and everything after is for when you copy with the share button of the mobile app.
https://www.amazon.de/gp/product/[^/]+/?|https://www.amazon.de/dp/[^/]+/
Upvotes: 1
Views: 65
Reputation: 2210
One line solution :
String result = myStrValue.replaceAll('.*(https://www\.amazon\.de/gp/product/\w+/).*', '$1');
.
\w+ means at least one word character: [a-zA-Z_0-9]
Try the java regex here : https://www.freeformatter.com/java-regex-tester.html#ad-output
Upvotes: 0
Reputation: 4403
There are many approaches. This approach works assuming that it is the section after product.
Pattern pat = Pattern.compile("^.*(https://.*/product/[^\\/]*?/).*");
Example:
public static void main(String[] args)
{
String inp = "Some Nice Article on amazon "
+ "https://www.amazon.de/gp/product/ADKLHJADK/ref=as_li_ss_tl"
+ "?ie=UTF8&pd_rd_i=B01J7LLL9Q&pd_rd_r"
+ "=a8c7bb4b-49da-11e8-ad28-014ae5dc2f42&pd_rd_w"
+ "=9QOk2&pd_rd_wg=zc1s7&pf_rd_m=A3JWKAKR8XB7XF&pf_rd_s=&pf_rd_r"
+ "=VF3C7MDNZ741H8S13AYV&pf_rd_t="
+ "36701&pf_rd_p=1c175abe-9bc7-490b-bbe1-2caf7e752c98&pf_rd_i"
+ "=desktop&linkCode=ll1";
Pattern pat = Pattern.compile("^.*(https://.*/product/[^\\/]*?/).*");
Matcher m = pat.matcher(inp);
if (m.matches() && m.groupCount() > 0) {
System.out.println(m.group(1));
}
}
The idea is to find the start of the "https:", then anything, then "product/", then anything until the next "/".
Resultant output:
Upvotes: 1
Reputation: 2727
Your regex will look like this:
https:\/\/www.amazon.de\/gp\/product\/[^\/]+\/?
[^\/]
means "everything that's not slash"
You can test it here: https://regex101.com/r/wwFmMw/1
Upvotes: 3