user3489905
user3489905

Reputation: 45

Read line by line between a pattern, then print in de-limited format

I have a ascii file with the contents given below:

START
this is my home
this is my pc

START
this is my linux
this is my awk
this is nice

START
this is a single line

START
this is my work
this is the end
this line has to be read

START
...
...

START
.
.
.
.

I want to read the lines between START and the blank line and print the output in delimited format. output should be the below format:

this is my home;this is my pc
this is my linux;this is my awk;this is nice
this is a single line
this is my work;this is the end;this line has to be read

I have used semicolon as delimiter. Please note: number of lines between START and Blank line is not fixed.

I have tried using awk, but I am able to read only one line after START

awk 'BEGIN { RS = "START" } ; { print $1 }'

Can anyone guide me to the correct forum/right direction...

Thanks

Upvotes: 2

Views: 90

Answers (4)

mklement0
mklement0

Reputation: 439247

The accepted answer doesn't preserve the individual lines in each block of lines as separate fields, to be separated with ; in the output; the following does:

awk -v RS='' -F'\n' -v OFS=';' '{sub(/^START\n/,""); $1=$1; print }' file
  • RS='' (setting the input record separator RS to an empty string) is an awk idiom with special meaning: it breaks the input into blocks of lines based on empty lines as the separators; in other words: each block of contiguous, non-empty lines forms one record.
  • -F'\n' sets the input field separator (also accessible as special variable FS) to a newline, so that each line in each record (block of lines) will become its own field.
  • OFS=';' sets the output field separator to ;, as requested by the OP.
  • sub(/^START\n/,"") strips the START line (plus its trailing newline) from each record (block of lines).
  • $1=$1 is a trick that, by assigning to a field variable, causes the input record to be rebuilt from its individual fields using the value of OFS as the separator; here, the individual lines (without their trailing newline) - are effectively joined with ; to form a single line.
  • print simply outputs the rebuilt record.

Upvotes: 0

dave sines
dave sines

Reputation: 1

This builds a single string containing relevant parts of the input file with the blocks separated by '\n' and the lines separated by ';'.

awk '
  t && $0 == "" { t = 0 ; sep = "\n" }
  t             { hold = hold sep $0 ; sep = ";" }
  $0 == "START" { t = 1 }
  END           { print hold }
' file

The first line deals with the end of a block.

If the in-a-block trigger is set, the second line appends a separator (either "", "\n" or ";" as appropriate) and the current record to the hold buffer.

The third line sets the trigger when a block starts -- if a block has already been started, the "START" line will be treated as part of the block.

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203995

$ awk -v RS= '{$1=$1} sub(/^START /,"")' file
this is my home this is my pc
this is my linux this is my awk this is nice
this is a single line
this is my work this is the end this line has to be read

Upvotes: 1

Jotne
Jotne

Reputation: 41460

You can do this:

awk -v RS="" '{$1=$1}1' file
START this is my home this is my pc
START this is my linux this is my awk this is nice
START this is a single line
START this is my work this is the end this line has to be read

To make sure each section contains START and remove it:

awk -v RS="" '{$1=$1} /^START/ {gsub(/^START /,"");print}' file
this is my home this is my pc
this is my linux this is my awk this is nice
this is a single line
this is my work this is the end this line has to be read

To give you some additional information on why your awk did fail.
You need to reconstruct every line after changing the RS, by using $1=$1
Then print the whole line by 1 or {print $0}
So to make your awkto work:

awk 'BEGIN { RS = "START" } {$1=$1} 1' file

or like this

awk -v RS="START" '{$1=$1} NR>1' file

The NR>1 prevents the first blank line form being printer.

The multiple characters in RS, makes this less portable, and you need gnu awk

Upvotes: 2

Related Questions