Reputation: 1154
I have a text file that lists any possible problems. It always starts with URL and then ends with Result and any error code if any. What I want to do is go through a txt file and get all the Error:404 Not Found block of text and output all those into a separate text file. I found this:
awk '/URL/,/404 Not Found/' text.txt > only404.txt
The problem is it finds URL and then stops looking until it gets to 404 Not Found which in the case below would also include the Valid: 200 OK...What I would really like to do is search for 404 Not Found then reverse itself until it gets to URL. Then it would work. Any ideas?
URL //fonts.googleapis.com/css?family=Lato:300,400,400italic,700'
Parent URL http://example.com, line 12, col 1
Real URL http://fonts.googleapis.com/css?family=Lato:300,400,400italic,700
Check time 1.863 seconds
Warning Access denied by robots.txt, skipping content checks.
Result Valid: 200 OK
URL `/image.png'
Parent URL http://example.com/styles.css, line 1380, col 17
Real URL http://example.com/image.png
Check time 0.443 seconds
Size 1KB
Result Error: 404 Not Found
Upvotes: 1
Views: 242
Reputation: 58430
This might work for you:
sed '/^\s*URL/,/^\s*Result/{/^\s*URL/{h;d};H;/Error: 404/{g;b}};d' file
URL `/image.png'
Parent URL http://example.com/styles.css, line 1380, col 17
Real URL http://example.com/image.png
Check time 0.443 seconds
Size 1KB
Result Error: 404 Not Found
Upvotes: 1
Reputation: 195079
this may work for you:
awk -v RS="" '/404 Not Found/' yourFile
test: is this what you want?
kent$ cat t
URL //fonts.googleapis.com/css?family=Lato:300,400,400italic,700'
Parent URL http://example.com, line 12, col 1
Real URL http://fonts.googleapis.com/css?family=Lato:300,400,400italic,700
Check time 1.863 seconds
Warning Access denied by robots.txt, skipping content checks.
Result Valid: 200 OK
URL `/image.png'
Parent URL http://example.com/styles.css, line 1380, col 17
Real URL http://example.com/image.png
Check time 0.443 seconds
Size 1KB
Result Error: 404 Not Found
kent$ awk -v RS="" '/404 Not Found/' t
URL `/image.png'
Parent URL http://example.com/styles.css, line 1380, col 17
Real URL http://example.com/image.png
Check time 0.443 seconds
Size 1KB
Result Error: 404 Not Found
Upvotes: 3