Reputation: 471
I'm looking for a string match inside a group of HTML files. I'm looking for all matching instances of the form:
="https:// ... .mp4"
Keep in mind that these are not on individual lines. They are all bunched together without any spaces, so returning actual lines won't work.
I want GREP to return all grouped instances of this type of URL. I'm wanting an output of unique links like this:
="https://www.something.com/file1.mp4"
="https://www.something.com/file2.mp4"
="https://www.something.com/file3.mp4"
="https://www.something.com/file4.mp4"
Here's the search parameter, as I thought I would need:
grep -hRo '\="https://.*\.mp4"\>' *.html
The double quotes and equal sign should be a part of the actual search string, but those are messing up my result, and I can't figure out how to escape them properly.
I'm running this on OSX in the terminal. Any help would be appreciated.
Upvotes: 0
Views: 767
Reputation: 2030
With traditional regex, the double quotes are escaped by the single quotes. You only need to escape the quantifier +
(one or more) and the literal dot in .mp4
grep -o '="http[^"]\+\.mp4"'
With PCRE, available in GNU Grep, you can actually match (without printing) the leading/trailing equals/doublequotes with Lookarounds:
grep -Po '(?<==")http[^"]+\.mp4(?=")'
(?<= ... )
- lookbehind(?= ... )
- lookaheadreturns:
https://www.something.com/file1.mp4
https://www.something.com/file2.mp4
https://www.something.com/file3.mp4
https://www.something.com/file4.mp4
Upvotes: 3