user1536435
user1536435

Reputation: 129

Parse comments and values selectively from a text file using linux

I wanted to parse a file with names and comments on top of some of the name blocks. If I had a file like:

Art
Boat
Road
Tree
Street

# Blah
Star
Car
Sun

Sock

# Comm1
# Comm2
Stop
Stick
# Comm
Stock
Dock

And I wanted to parse this file in a way so as to extract all names starting with 'S' with their corresponding comments. Corresponding comments are the immediately preceding comment block (one or more lines of comments) till a white space is encountered preceding it. Also one comment block applies to all entries following it till a white space or another comment block is encountered. So the output of the above input should be something like:

**Name      Comments**

Street
Star        # Blah
Sun         # Blah
Sock
Stop        # Comm1 # Comm2
Stick       # Comm1 # Comm2
Stock       # Comm

Can anyone suggest a good way to go about doing this (preferably using shell)? Would really appreciate it. Thanks!

PS: I apologize if I am not clear in my description, still new at this.

Upvotes: 0

Views: 445

Answers (2)

William Pursell
William Pursell

Reputation: 212248

Assuming your blank lines contain no whitespace:

sed -n '/^#/H; /^S/{G; y/\n/ /; p}; /^$/h' input

The first command (/^#/H) appends the current line (a comment) to the hold space. The next command appends the hold space (containing all the accumulated comments) to the current buffer, replaces all newlines with a single space, and then prints the line. The final command clears the hold space whenever a blank line is encountered.

EDIT (thanks blahdiblah)

The above does not reset the accumulator correctly when a new comment block is detected without a preceding blank line. This is ugly, but accounts for that:

sed -n '/^#/{h; bk}; :j /^S/{G; y/\n/ /; p}; /^$/h; d; :k n; /^#/{ H; bk}; bj;' input

Upvotes: 1

blahdiblah
blahdiblah

Reputation: 33991

Here's some slightly inelegant awk that does the job:

awk '/^$/ {ca=""; cp=""} /^#/ {ca=ca " " $0} /^S/ && ca {cp=ca; ca=""} /^S/ {print $0 " " cp}' < input.txt > output.txt

There are two stores: the comment accumulator, ca, and the comment print buffer, cp.

  1. Whenever a blank line is encountered, both are cleared.
  2. When a comment line is encountered, it's added to the comment accumulator.
  3. When a line starting with S is encountered and the comment accumulator has content, the comment print buffer is set to whatever's in the comment accumulator and the latter's cleared.
  4. When a line starting with S is encountered, it's printed followed by whatever's in the comment print buffer.

There's probably a more elegant way to do this, and this doubtless has problems (e.g., putting a space at the end of lines with no comments), but it'll get you started.

Upvotes: 1

Related Questions