Patrick Hennessey
Patrick Hennessey

Reputation: 471

How do I use GREP to match a string with quotation marks inside it?

I'm looking for a string match inside a group of HTML files. I'm looking for all matching instances of the form:

="https://  ...  .mp4"

Keep in mind that these are not on individual lines. They are all bunched together without any spaces, so returning actual lines won't work.

I want GREP to return all grouped instances of this type of URL. I'm wanting an output of unique links like this:

="https://www.something.com/file1.mp4"
="https://www.something.com/file2.mp4"
="https://www.something.com/file3.mp4"
="https://www.something.com/file4.mp4"

Here's the search parameter, as I thought I would need:

grep -hRo '\="https://.*\.mp4"\>' *.html

The double quotes and equal sign should be a part of the actual search string, but those are messing up my result, and I can't figure out how to escape them properly.

I'm running this on OSX in the terminal. Any help would be appreciated.

Upvotes: 0

Views: 767

Answers (1)

vintnes
vintnes

Reputation: 2030

With traditional regex, the double quotes are escaped by the single quotes. You only need to escape the quantifier + (one or more) and the literal dot in .mp4

grep -o '="http[^"]\+\.mp4"'

With PCRE, available in GNU Grep, you can actually match (without printing) the leading/trailing equals/doublequotes with Lookarounds:

grep -Po '(?<==")http[^"]+\.mp4(?=")'
  • (?<= ... ) - lookbehind
  • (?= ... ) - lookahead

returns:

https://www.something.com/file1.mp4
https://www.something.com/file2.mp4
https://www.something.com/file3.mp4
https://www.something.com/file4.mp4

Upvotes: 3

Related Questions