How to extract content between tags in html using grep command

Question

I want to write a grep command which will extract content between h1 tags irrespective of class and other attributes

I tried

 grep -o '>.*' Email.txt

But gave only three elements

Wiktor Stribiżew · Accepted Answer

With GNU grep, you may use

grep -oP ']*)?>\K.*?(?=)' Email.txt

The -P option will enable PCRE regex engine and the pattern will match

- string


(?:\s[^>]*)? - an optional non-capturing group matching 1 or 0 occurrences of a whitespace (\s) followed with 0+ chars other than >
> - a > char
\K - match reset operator that discards the text matched so far from the match memory buffer
.*? - any 0+ chars other than line break chars, as few as possible
(?=) - a positive lookahead that matches a location that is immediately followed with  substring.

How to extract content between tags in html using grep command

Answers (1)

Related Questions