Reputation: 15109
I have a multiline output, like this:
foo: some text
goes here
and here
and here
bar: more text
goes here
and here
xyz: and more...
and more...
and more...
The text's format is exactly as shown here. The "groups/sections" of text I'm interested in start right after the beginning of the line and end at the line before the next text starts right at the beginning of a line.
In this example the grouls would be foo
and all the text right before bar
. Then bar
and all the text right before xyz
. And finally, xyz
until the end.
Upvotes: 0
Views: 3579
Reputation: 161
First, if there's a single section, go with @Akshay Hegde. Otherwise if you can change the RS, follow @sheltond. But for logfile processing I often need to sometimes extract linewise, and some sections multi-line, so that some logfile summary ends up as short as possible.
Here I usually use some variation on a braindead pattern. For example, suppose I want to
file print_bar_sections.awk :
function bar_may_end_here() { # This check might happen in several places
if(bar_started){
print(bar_out); bar_out=""; bar_started=0;
}
}
# Here, any section-begin match might be terminating a bar section
/^[a-z]*:/ {bar_may_end_here();}
# Match start of interesting section, this line always included
/^bar:/ {bar_started=1; bar_out=$0; next;}
# Pehaps modify, skip interior lines?
# bar_started==1 && /goes/ {bar_out = bar_out "GOES-LINE"; next;}
# Here, join lines
bar_started==1 {bar_out = bar_out $0; next;}
# Here we know we are not in a bar-section.
# For example, we might have single-line "interesting lines"
/error/ {print; next;}
/warning/ {print; next;}
# EOF might also terminate an active bar section
# (for logfiles you might know this is impossible)
END { bar_may_end_here(); }
Adjust this pattern as needed. awk begins with strings empty and
variables 0. The next
command is especially useful
when creating such section extractors for log file processing.
Sometimes this approach of creating a state machine variable like
bar_started
and state info like a bar_out
string can allow rather more
complicated awk programs. For example, the state variable might need more
values than 0 or 1, and the the stored state info might more complex
(array or several variables). Enjoy!
Upvotes: 1
Reputation: 1937
As others have said, you haven't specified what you want to do with the data once you've parsed it.
If you just want to extract a particular chunk, the answer from Akshay Hegde should work fine.
If you want to process each record using some more awk functionality, such as transforming the output in some way (e.g. joining the lines together, etc), you probably need something a bit different.
There are a couple of fairly easy ways that you can do this, but I think the best approach is probably to change the record separator.
The ability to use a regular expression as the record separator is a gawk extension, but you're probably using gawk if you're on Linux.
Here is the contents of a gawk program file "prog.awk":
function process_group(name, body) {
print "Got group with name '" name "'";
print body;
}
BEGIN {
RS="(\n|^)\\S+:"
PREV=""
}
{
if (PREV!="") {
process_group(gensub(/\n?(\S+):/, "\\1", "", PREV), $0);
}
PREV=RT
}
You can run this using
gawk -f prog.awk input.txt
Alternatively you can put the whole thing on the gawk command-line, but it's easier to read if it's nicely formatted.
The idea is that each time it sees the record separator it gives you the content since the last record separator or the beginning of the file. This means that the first time it sees the record separator it calls the bottom block with the record separator "foo:" and an empty body, the second time it sees the record separator it calls the block with "bar:" and the content between "foo:" and "bar:", etc.
This means that the record separator corresponding to each block is the previous one, not the current one. This is easy to handle by keeping track of the previous record separator in the "PREV" variable.
So, the BEGIN block sets the record separator RS, and initializes PREV to be empty.
The block at the bottom is called for each record delimited by RS, and once more at the end of the file.
If "PREV" is not empty, it calls the "process_group" function with the current body data and the previous record separator (stripping off the uninteresting bits from PREV on the way through using gensub). It then assigns the currently matches record separator (RT) to PREV for use next time.
In "process_group", you can do whatever processing you want with each group. In this case I'm just printing them out, but it should be easy to modify it to do whatever you want.
Upvotes: 0
Reputation: 11573
If I'm interpreting your question correctly you want to simply remove the whitespace and put foo
on a different line than the part after :
. This awk script would do that:
awk 'BEGIN{RS="[:\n]"}{$1=$1}1' file
Output:
foo
some text
goes here
and here
and here
bar
more text
goes here
and here
xyz
and more...
and more...
and more...
Explanation:
RS="[:\n]
says that lines should be split either at :
or at \n
$1=$1
reprocesses the line into $0
(removes whitespace at beginning of line)1
says that every line should be processes with the "default action" which is print $0
Upvotes: 0
Reputation: 16997
Input
$ cat file
foo: some text
goes here
and here
and here
bar: more text
goes here
and here
xyz: and more...
and more...
and more...
Output
$ awk '/:/{f=/^foo/}f' file
foo: some text
goes here
and here
and here
Incase if you want to skip line matched then
$ awk '/:/{f=/^foo/;next}f' file
goes here
and here
and here
Or even
# Just modify variable search value
# 1st approach
$ awk -v search="foo" '/:/{f=$0~"^"search}f' file
foo: some text
goes here
and here
and here
# 2nd approach
$ awk -v search="foo" '/:/{f=$0~"^"search;next}f' file
goes here
and here
and here
Upvotes: 2