cebach561
cebach561

Reputation: 165

AWK join lines between record separator

I'm new to awk and have a question. I have a file that uses > as a record separator and below the record separator are random strings. Essentially what I would like to do is use awk to print the record separator and join the strings below the record separator.

Example file:

input:
>1
AAAA
BB
CCCCCCC
>2
AA
BBBBBBB
CCCC
...

output:
>1
AAAABBCCCCCCC
>2
AABBBBBBBCCCC
...

I have this awk program which works when there are set number of lines below the record separator (as in the first example):

awk 'BEGIN { FS = "\n"; RS = ">" } {print ">"$1 } {print $2$3$4}' file

Is there a way I can use awk to account for any number of strings that might appear below the record separator?

Example:
input:
>1
AAAAAA
BBB
CCCCCCCC
DDDD
FFF
>2
AAAAA
CCC
...

output:
>1 
AAAAAABBBCCCCCCCCDDDDFFF
>2
AAAAACCC
...

Upvotes: 1

Views: 2307

Answers (3)

glenn jackman
glenn jackman

Reputation: 247012

awk '
    BEGIN  {RS=">"; FS="\n"; OFS=""} 
    NR > 1 {$1 = $1 FS; print RS, $0}
' file

Upvotes: 1

Jotne
Jotne

Reputation: 41460

Here is an awk

awk '/^>/ {print (NR==1?"":RS)$0;next} {printf "%s",$0}' file
>1
AAAAAABBBCCCCCCCCDDDDFFF
>2
AAAAACCC

Upvotes: 3

Bill Karwin
Bill Karwin

Reputation: 562691

You can loop from 2 to NF which is a built-in variable for the number of fields.

Print them with printf() to avoid outputting a newline. Then printf() one newline at the end of the record.

awk 'BEGIN { FS = "\n"; RS = ">" } 
    { print ">"$1 } 
    { for(i=2; i<NF; ++i) printf($(i)); }
    { printf("\n"); }' file

Upvotes: 2

Related Questions