Reputation: 1502
I have this type of url:
https://example.com/en/app/893245
https://example.com/ru/app/wq23245
https://example.com/app/8984245
I want to extract only word between com
and app
https://example.com/en/app/893245 -> en
https://example.com/ru/app/wq23245 -> ru
https://example.com/app/8984245 ->
I tried to exclude app from capture group but I don't know how to do it except like this:
.*com\/((?!app).*)\/app
Is it possible to something like this but excluding the word app from being captured? example\.com\/(\w+|?!app)\/
Rubular link: https://rubular.com/r/NnojSgQK7EuelE
Upvotes: 1
Views: 505
Reputation: 2946
you could use sed
sed -n -f script.sed yourinput.txt
and inside script.sed:
s/.*com\/\(.*\)\/app.*/\1/p
Example input:
https://example.com/en/app/893245
https://example.com/ru/app/wq23245
https://example.com/app/8984245
Example output:
$ sed -n -f comapp.sed comapp.txt
en
ru
Upvotes: 0
Reputation: 627537
If you need a plain regex you may use lookarounds:
/(?<=example\.com\/)\w+(?=\/app)/
Or, probably better in a context of a URL:
/(?<=example\.com\/)[^\/]+(?=\/app)/
See the Rubular demo.
In Ruby, you may use
strs = ['https://example.com/en/app/893245','https://example.com/ru/app/wq23245','https://example.com/app/8984245']
strs.each { |s|
p s[/example\.com\/(\w+)\/app/, 1]
}
# => ["en", "ru", nil]
Upvotes: 2