How to get string starting and ending with something, containing a substring?

Question

I'm new to regex and I am trying to grab urls from a big html-text file. The links are "trapped" in the following types of strings:

,"link_value":"https://www.linkedin.com/company/randomcompanyA"},"event":"link_click&

I want to write a regex line that will get me any string starting and ending with ", containing linkedin or instagram etc. In other words, I want to grab strings/links by defining a substring in that link, so I do not want a general line returning all links in a file. So far I've been able to write the following:

(?<=").+?(?=")

But I'm not able to work in the 'contains linkedin' part in there. The above command would therefore also return link_value, for example.

Any help is appreciated!

anubhava · Accepted Answer

Since you're already using look arounds, you can make your regex more specific by starting your match with http:// or https:// like this:

(?<=")https?:\/\/[^\/]*?\b(?:linkedin|instagram)\.\S+?(?=")

RegEx Demo

RegEx Details:

https?:\/\/ will match http:// or https://
[^\/]*? matches 0 or more of any character that is not / (lazy)
\b(?:linkedin|instagram)\. will match any of the given strings in the link followed by a dot.
\S+? matches 1 or more of any character that is not a whitespace (lazy)

How to get string starting and ending with something, containing a substring?

Answers (2)

Related Questions