wxmikey
wxmikey

Reputation: 47

Grep for string and read content until next match string

I am trying to read a file and search for a string using grep. Once I find the string, I want to read everything after the string until I match another string. So in my example, I am searching for ...SUMMARY... and I want to read everything until the occurrence of ... Here is an example:

**...SUMMARY...**
   Severe thunderstorms are most likely across north-central/northeast
   Texas and the Ark-La-Tex region during the late afternoon and
   evening. Destructive hail and wind, along with a few tornadoes are
   possible. Severe thunderstorms are also expected across the
   Mid-South and Ohio Valley.

   **...**North-central/northeast TX and southeast OK/ArkLaTex...
   In the wake of a decaying MCS across the Lower Mississippi River
   Valley, a northwestward-extending outflow boundary will continue to
   modify/drift northward with rapid/strong destabilization this
   afternoon particularly along and south of it. A quick
   reestablishment of lower/some middle 70s F surface dewpoints will
   occur into prior-MCS-impacted areas, with MLCAPE in excess of 4000
   J/kg expected for parts of north-central/northeast Texas into far
   southeast Oklahoma and the nearby ArkLaTex. Special 19Z observed
   soundings are expected from Fort Worth/Shreveport to help better
   gauge/confirm this destabilization trend and the degree of capping.

I have tried using the following code but only displays the ...SUMMARY... and the next line.

sed -n '/...SUMMARY.../,/.../p' 

What can I do to solve this?

======================================================================= Followup:

This is the result I am trying to get. Only show the paragraph under ...SUMMARY... and end at the next ... so this is what I should get in the end:

Severe thunderstorms are most likely across north-central/northeast Texas and the Ark-La-Tex region during the late afternoon and evening. Destructive hail and wind, along with a few tornadoes are possible. Severe thunderstorms are also expected across the Mid-South and Ohio Valley.

I have tried the following based on a recommendation Shellter:

sed -n '/...SUMMARY.../,/**...**/p'

But I get everything.

Upvotes: 4

Views: 359

Answers (2)

user3408541
user3408541

Reputation: 63

ahoy, ahoy!

I realized you asked for an answer using grep or sed, but please consider Perl? Here is a solution. I tried playing around with the range operators and they seemed a bit flimsy, so I just wrote the regex myself.

I removed the asterisks from the sample file you said were there for emphasis and put the trailing ... on to a new header. Thats kind of more like the code from the website you linked to. I made it into a global match so it will find multiple summaries if there are more than one. When I checked the html it only appeared to be one. I will leave it global just in case one day there are more than one.

#!/usr/bin/perl -w

undef $/; #grab entire file instead of each line because there are newlines
my $i=1;
while(<>){
  while(/\.{3}SUMMARY\.{3}([\w\W]*?)\.{3}/g ){#I made this regex non-greedy, that way it wont miss the closing tag
    print "Match $i\n";
    print "-------------------\n";
    print "$1\n" if($1);
    print "-------------------\n";
    $i++;
  }
}

Here is the sample output file with the changes I talked about earlier.

   ...SUMMARY...
   Severe thunderstorms are most likely across north-central/northeast
   Texas and the Ark-La-Tex region during the late afternoon and
   evening. Destructive hail and wind, along with a few tornadoes are
   possible. Severe thunderstorms are also expected across the
   Mid-South and Ohio Valley.

   ...NEXT HEADER...
   North-central/northeast TX and southeast OK/ArkLaTex...
   In the wake of a decaying MCS across the Lower Mississippi River
   Valley, a northwestward-extending outflow boundary will continue to
   modify/drift northward with rapid/strong destabilization this
   afternoon particularly along and south of it. A quick
   reestablishment of lower/some middle 70s F surface dewpoints will
   occur into prior-MCS-impacted areas, with MLCAPE in excess of 4000
   J/kg expected for parts of north-central/northeast Texas into far
   southeast Oklahoma and the nearby ArkLaTex. Special 19Z observed
   soundings are expected from Fort Worth/Shreveport to help better
   gauge/confirm this destabilization trend and the degree of capping.

Output looks like this

$ perl rangeOperatorRegex.pl rangeOperatorRegex.txt
Match 1
-------------------

   Severe thunderstorms are most likely across north-central/northeast
   Texas and the Ark-La-Tex region during the late afternoon and
   evening. Destructive hail and wind, along with a few tornadoes are
   possible. Severe thunderstorms are also expected across the
   Mid-South and Ohio Valley.

   
-------------------

If you wanted to grab data from https://www.spc.noaa.gov/products/outlook/day1otlk.html via wget, you could run something like this. I named the Perl script rangeOperatorRegex.pl, but you can rename it to anything you want.

$ wget -qO - https://www.spc.noaa.gov/products/outlook/day1otlk.html | perl rangeOperatorRegex.pl

Output looks like this

Match 1
-------------------

   Severe thunderstorms appear unlikely through tonight.

   
-------------------

Here is an answer on how to pipe wget input into Perl via STDIN.

how to retrive a perl file using wget and execute it using a one-liner?

Good Luck!

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626690

You may use

sed -n '/^[[:blank:]]*\.\.\.SUMMARY\.\.\./,/^[[:blank:]]*\.\.\./{//!p;}' file

See this online sed demo.

NOTES:

Upvotes: 1

Related Questions