Maca
Maca

Reputation: 1689

How do I regex part of url

I need help solving regex (PCRE). I want to extract the hello-world part from multiple url below. I got this so far:

^/news/(.*?)/$

https://www.example.com/news/2017-08-09/hello-world/topics/

https://www.example.com/news/2017-08-09/hello-world/gallery/

https://www.example.com/news/2017-08-09/hello-world/

But this captures 2017-08-09/hello-world/topics and I only need hello-world

Upvotes: 1

Views: 58

Answers (3)

anubhava
anubhava

Reputation: 785481

You can use this regex in PCRE:

~/news/[^/]*/\K[^/]+~
  • /news/[^/]*/: Match /news/ followed by zero or more non-/ followed by /
  • \K: Forego matched information
  • [^/]+: Match one or more non-/ characters

RegEx Demo


You may also use a capturing group:

/news/[^/]*/([^/]+)

and extract capturing group #2

RegEx Demo 2

Upvotes: 2

anvita surapaneni
anvita surapaneni

Reputation: 369

[0-9]{4}-[0-9]{2}-[0-9]{2}/(.*?)/ the group 1 has hello world

https://regex101.com/r/wFM7nc/1

Upvotes: 0

Killer Death
Killer Death

Reputation: 459

IF hello-world represents an unknown text and the rest is fixed, try this:

^/news/2017-08-09/(.*?)/.*$

If date is not fixed, you can specify format it is in and use that instead, for example \d{4}-\d{2}-\d{2} or whatever you need.

Upvotes: 0

Related Questions