user983223
user983223

Reputation: 1154

A real reverse in regular expression

I have a text file that lists any possible problems. It always starts with URL and then ends with Result and any error code if any. What I want to do is go through a txt file and get all the Error:404 Not Found block of text and output all those into a separate text file. I found this:

awk '/URL/,/404 Not Found/' text.txt > only404.txt

The problem is it finds URL and then stops looking until it gets to 404 Not Found which in the case below would also include the Valid: 200 OK...What I would really like to do is search for 404 Not Found then reverse itself until it gets to URL. Then it would work. Any ideas?

    URL //fonts.googleapis.com/css?family=Lato:300,400,400italic,700'
    Parent URL http://example.com, line 12, col 1
    Real URL   http://fonts.googleapis.com/css?family=Lato:300,400,400italic,700
    Check time 1.863 seconds
    Warning    Access denied by robots.txt, skipping content checks.
    Result     Valid: 200 OK

    URL   `/image.png'
    Parent URL http://example.com/styles.css, line 1380, col 17
    Real URL   http://example.com/image.png
    Check time 0.443 seconds
    Size       1KB
    Result     Error: 404 Not Found

Upvotes: 1

Views: 242

Answers (2)

potong
potong

Reputation: 58430

This might work for you:

sed '/^\s*URL/,/^\s*Result/{/^\s*URL/{h;d};H;/Error: 404/{g;b}};d' file
    URL   `/image.png'
    Parent URL http://example.com/styles.css, line 1380, col 17
    Real URL   http://example.com/image.png
    Check time 0.443 seconds
    Size       1KB
    Result     Error: 404 Not Found

Upvotes: 1

Kent
Kent

Reputation: 195079

this may work for you:

 awk -v RS="" '/404 Not Found/' yourFile

test: is this what you want?

kent$  cat t
    URL //fonts.googleapis.com/css?family=Lato:300,400,400italic,700'
    Parent URL http://example.com, line 12, col 1
    Real URL   http://fonts.googleapis.com/css?family=Lato:300,400,400italic,700
    Check time 1.863 seconds
    Warning    Access denied by robots.txt, skipping content checks.
    Result     Valid: 200 OK

    URL   `/image.png'
    Parent URL http://example.com/styles.css, line 1380, col 17
    Real URL   http://example.com/image.png
    Check time 0.443 seconds
    Size       1KB
    Result     Error: 404 Not Found

kent$  awk -v RS="" '/404 Not Found/' t
    URL   `/image.png'
    Parent URL http://example.com/styles.css, line 1380, col 17
    Real URL   http://example.com/image.png
    Check time 0.443 seconds
    Size       1KB
    Result     Error: 404 Not Found

Upvotes: 3

Related Questions