Reputation: 1343
I am trying to write a regular expression to extract URLs, which have endpoints with the following format:
https://api.siteurl.com/id/a1b2c3d4/apps
https://api.siteurl.com/id/a1b2c3d4/devices
...
etc
The id in these urls are a1b2c3d4
, and can differ between URLs, but I want to extract the text that surrounds it:
The following regular expression matches the entire string:
https:\/\/\S+\.\S+\.com\/id\/\S+\/\S+
However, I don't want to extract the id itself, and just want to use it as a lookahead.
The final extracted string should be like https://api.siteurl.com/id'...'apps'
Where the ...
is not actually extracted.
Is it only possible to do this using 2 regexes, where each uses a look-ahead and a look-behind, or can a single expression be used to extract just the relevant parts of the url?
Upvotes: 0
Views: 863
Reputation: 163227
You could use 2 capturing groups to capture the data that you want to keep, and match the data that you don't want to keep.
(https:\/\/\S+\.\S+\.com\/id)\/[^\/]+\/(\S+)
(
Capture group 1
https:\/\/\S+\.\S+\.com\/id
Match the start of the string till id
without /
)
Close group\/
Match the /
following[^\/]+\/
Match +1 times any char except /
, then match /
(\S+)
Capture group 2 Match 1+ times a non whitespace charThis is the pattern from the comment without the non capturing group (?:
as it is unnecessary.
Upvotes: 1