Goodies
Goodies

Reputation: 4681

Grep and Regex an HTML File

I have an HTML file with thousands of lines, but something is repeated.

CODE=12345-ABCDE-12345-ABCDE</div>...<!--This line goes on for hundreds of characters-->

Now, The line starts with "CODE=" every time, and the length of the code is the same every time. The following 28 characters are either letters, numbers, or dashes.

cat mysite.html | grep "CODE="

But I'd like a regex to display everything on the line BEFORE</div>

Thanks!

Upvotes: 0

Views: 116

Answers (2)

ray
ray

Reputation: 4267

You can use sed also:

sed -rn 's@^(CODE=[A-Za-z0-9\-]{23})</div>.*@\1@p' file

Match any line staring with CODE= followed by 23 characters containing either letters, numbers, or dashes, followed by </div>

Upvotes: 0

Simeon Visser
Simeon Visser

Reputation: 122376

You can use cut instead:

cat myfile.html | cut -c 6-28

This shows the characters 6 - 28 of each line. This makes use of the fact that the length of CODE= is known as well as the length of the code that follows.

Upvotes: 1

Related Questions