Mio
Mio

Reputation: 1502

In regex capture group, exclude one word

I have this type of url:

https://example.com/en/app/893245
https://example.com/ru/app/wq23245
https://example.com/app/8984245

I want to extract only word between com and app

https://example.com/en/app/893245 -> en
https://example.com/ru/app/wq23245 -> ru
https://example.com/app/8984245 ->

I tried to exclude app from capture group but I don't know how to do it except like this:

.*com\/((?!app).*)\/app

Is it possible to something like this but excluding the word app from being captured? example\.com\/(\w+|?!app)\/

Rubular link: https://rubular.com/r/NnojSgQK7EuelE

Upvotes: 1

Views: 505

Answers (2)

malarres
malarres

Reputation: 2946

you could use sed

sed -n -f script.sed yourinput.txt

and inside script.sed:

s/.*com\/\(.*\)\/app.*/\1/p

Example input:

https://example.com/en/app/893245
https://example.com/ru/app/wq23245
https://example.com/app/8984245

Example output:

$ sed -n -f comapp.sed comapp.txt
en
ru

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627537

If you need a plain regex you may use lookarounds:

/(?<=example\.com\/)\w+(?=\/app)/

Or, probably better in a context of a URL:

/(?<=example\.com\/)[^\/]+(?=\/app)/

See the Rubular demo.

In Ruby, you may use

strs = ['https://example.com/en/app/893245','https://example.com/ru/app/wq23245','https://example.com/app/8984245']
strs.each { |s|
    p s[/example\.com\/(\w+)\/app/, 1]
}
# => ["en", "ru", nil]

Upvotes: 2

Related Questions