Reputation: 47
I am trying to read a file and search for a string using grep
. Once I find the string, I want to read everything after the string until I match another string. So in my example, I am searching for ...SUMMARY...
and I want to read everything until the occurrence of ...
Here is an example:
**...SUMMARY...**
Severe thunderstorms are most likely across north-central/northeast
Texas and the Ark-La-Tex region during the late afternoon and
evening. Destructive hail and wind, along with a few tornadoes are
possible. Severe thunderstorms are also expected across the
Mid-South and Ohio Valley.
**...**North-central/northeast TX and southeast OK/ArkLaTex...
In the wake of a decaying MCS across the Lower Mississippi River
Valley, a northwestward-extending outflow boundary will continue to
modify/drift northward with rapid/strong destabilization this
afternoon particularly along and south of it. A quick
reestablishment of lower/some middle 70s F surface dewpoints will
occur into prior-MCS-impacted areas, with MLCAPE in excess of 4000
J/kg expected for parts of north-central/northeast Texas into far
southeast Oklahoma and the nearby ArkLaTex. Special 19Z observed
soundings are expected from Fort Worth/Shreveport to help better
gauge/confirm this destabilization trend and the degree of capping.
I have tried using the following code but only displays the ...SUMMARY...
and the next line.
sed -n '/...SUMMARY.../,/.../p'
What can I do to solve this?
======================================================================= Followup:
This is the result I am trying to get. Only show the paragraph under ...SUMMARY... and end at the next ... so this is what I should get in the end:
Severe thunderstorms are most likely across north-central/northeast Texas and the Ark-La-Tex region during the late afternoon and evening. Destructive hail and wind, along with a few tornadoes are possible. Severe thunderstorms are also expected across the Mid-South and Ohio Valley.
I have tried the following based on a recommendation Shellter:
sed -n '/...SUMMARY.../,/**...**/p'
But I get everything.
Upvotes: 4
Views: 359
Reputation: 63
ahoy, ahoy!
I realized you asked for an answer using grep or sed, but please consider Perl? Here is a solution. I tried playing around with the range operators and they seemed a bit flimsy, so I just wrote the regex myself.
I removed the asterisks from the sample file you said were there for emphasis and put the trailing ...
on to a new header. Thats kind of more like the code from the website you linked to. I made it into a global match so it will find multiple summaries if there are more than one. When I checked the html it only appeared to be one. I will leave it global just in case one day there are more than one.
#!/usr/bin/perl -w
undef $/; #grab entire file instead of each line because there are newlines
my $i=1;
while(<>){
while(/\.{3}SUMMARY\.{3}([\w\W]*?)\.{3}/g ){#I made this regex non-greedy, that way it wont miss the closing tag
print "Match $i\n";
print "-------------------\n";
print "$1\n" if($1);
print "-------------------\n";
$i++;
}
}
Here is the sample output file with the changes I talked about earlier.
...SUMMARY...
Severe thunderstorms are most likely across north-central/northeast
Texas and the Ark-La-Tex region during the late afternoon and
evening. Destructive hail and wind, along with a few tornadoes are
possible. Severe thunderstorms are also expected across the
Mid-South and Ohio Valley.
...NEXT HEADER...
North-central/northeast TX and southeast OK/ArkLaTex...
In the wake of a decaying MCS across the Lower Mississippi River
Valley, a northwestward-extending outflow boundary will continue to
modify/drift northward with rapid/strong destabilization this
afternoon particularly along and south of it. A quick
reestablishment of lower/some middle 70s F surface dewpoints will
occur into prior-MCS-impacted areas, with MLCAPE in excess of 4000
J/kg expected for parts of north-central/northeast Texas into far
southeast Oklahoma and the nearby ArkLaTex. Special 19Z observed
soundings are expected from Fort Worth/Shreveport to help better
gauge/confirm this destabilization trend and the degree of capping.
Output looks like this
$ perl rangeOperatorRegex.pl rangeOperatorRegex.txt
Match 1
-------------------
Severe thunderstorms are most likely across north-central/northeast
Texas and the Ark-La-Tex region during the late afternoon and
evening. Destructive hail and wind, along with a few tornadoes are
possible. Severe thunderstorms are also expected across the
Mid-South and Ohio Valley.
-------------------
If you wanted to grab data from https://www.spc.noaa.gov/products/outlook/day1otlk.html
via wget
, you could run something like this. I named the Perl script rangeOperatorRegex.pl
, but you can rename it to anything you want.
$ wget -qO - https://www.spc.noaa.gov/products/outlook/day1otlk.html | perl rangeOperatorRegex.pl
Output looks like this
Match 1
-------------------
Severe thunderstorms appear unlikely through tonight.
-------------------
Here is an answer on how to pipe wget input into Perl via STDIN
.
how to retrive a perl file using wget and execute it using a one-liner?
Good Luck!
Upvotes: 0
Reputation: 626690
You may use
sed -n '/^[[:blank:]]*\.\.\.SUMMARY\.\.\./,/^[[:blank:]]*\.\.\./{//!p;}' file
See this online sed
demo.
NOTES:
/.../
just matches a line with any 3 chars^
matches the start of a line and [[:space:]]*
matches any 0+ whitespace chars{//!p;}
gets you the contents between two lines excluding those lines (see How to print lines between two patterns, inclusive or exclusive (in sed, AWK or Perl)?)Upvotes: 1