Charlie Shuffler
Charlie Shuffler

Reputation: 175

How to get string starting and ending with something, containing a substring?

I'm new to regex and I am trying to grab urls from a big html-text file. The links are "trapped" in the following types of strings:

,"link_value":"https://www.linkedin.com/company/randomcompanyA"},"event":"link_click&

I want to write a regex line that will get me any string starting and ending with ", containing linkedin or instagram etc. In other words, I want to grab strings/links by defining a substring in that link, so I do not want a general line returning all links in a file. So far I've been able to write the following:

(?<=&quot;).+?(?=&quot;)

But I'm not able to work in the 'contains linkedin' part in there. The above command would therefore also return link_value, for example.

Any help is appreciated!

Upvotes: 1

Views: 1060

Answers (2)

anubhava
anubhava

Reputation: 785176

Since you're already using look arounds, you can make your regex more specific by starting your match with http:// or https:// like this:

(?<=&quot;)https?:\/\/[^\/]*?\b(?:linkedin|instagram)\.\S+?(?=&quot;)

RegEx Demo

RegEx Details:

  • https?:\/\/ will match http:// or https://
  • [^\/]*? matches 0 or more of any character that is not / (lazy)
  • \b(?:linkedin|instagram)\. will match any of the given strings in the link followed by a dot.
  • \S+? matches 1 or more of any character that is not a whitespace (lazy)

Upvotes: 1

itsofirk
itsofirk

Reputation: 42

this regex will grab URLs regardless the "quot" tags

https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)

Tell me if it works

Upvotes: 0

Related Questions