alexev31
alexev31

Reputation: 41

Regex everything after, but not including

I am trying to regex the following string:

https://www.amazon.com/Tapps-Top-Apps-and-Games/dp/B00VU2BZRO/ref=sr_1_3?ie=UTF8&qid=1527813329&sr=8-3&keywords=poop

I want only B00VU2BZRO.

This substring is always going to be a 10 characters, alphanumeric, preceded by dp/.

So far I have the following regex:

[d][p][\/][0-9B][0-9A-Z]{9}

This matches dp/B00VU2BZRO

I want to match only B00VU2BZRO with no dp/

How do I regex this?

Upvotes: 4

Views: 9233

Answers (2)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520968

Here is one regex option which would produce an exact match of what you want:

(?<=dp\/)(.*)(?=\/)

Demo

Note that this solution makes no assumptions about the length of the path fragment occurring after dp/. If you want to match a certain number of characters, replace (.*) with (.{10}), for example.

Upvotes: 4

Mad Physicist
Mad Physicist

Reputation: 114230

Depending on your language/method of application, you have a couple of options.

  1. Positive look behind. This will make your regex more complicated, but will make it match what you want exactly:

    (<=dp/)[0-9A-Z]{10}
    

    The construct (<=...) is called a positive look behind. It will not consume any of the string, but will only allow the match to happen if the pattern between the parens is matched.

  2. Capture group. This will make the regex itself slightly simpler, but will add a step to the extraction process:

    dp/([0-9A-Z]{10})
    

    Anything between plain parens is a capture group. The entire pattern will be matched, including dp/, but most languages will give you a way of extracting the portion you are interested in.

Depending on your language, you may need to escape the forward slash (/).

As an aside, you never need to create a character class for single characters: [d][p][\/] can equally well be written as just dp\/.

Upvotes: 1

Related Questions