Reputation: 165
I'm new to awk and have a question. I have a file that uses > as a record separator and below the record separator are random strings. Essentially what I would like to do is use awk to print the record separator and join the strings below the record separator.
Example file:
input:
>1
AAAA
BB
CCCCCCC
>2
AA
BBBBBBB
CCCC
...
output:
>1
AAAABBCCCCCCC
>2
AABBBBBBBCCCC
...
I have this awk program which works when there are set number of lines below the record separator (as in the first example):
awk 'BEGIN { FS = "\n"; RS = ">" } {print ">"$1 } {print $2$3$4}' file
Is there a way I can use awk to account for any number of strings that might appear below the record separator?
Example:
input:
>1
AAAAAA
BBB
CCCCCCCC
DDDD
FFF
>2
AAAAA
CCC
...
output:
>1
AAAAAABBBCCCCCCCCDDDDFFF
>2
AAAAACCC
...
Upvotes: 1
Views: 2307
Reputation: 247012
awk '
BEGIN {RS=">"; FS="\n"; OFS=""}
NR > 1 {$1 = $1 FS; print RS, $0}
' file
Upvotes: 1
Reputation: 41460
Here is an awk
awk '/^>/ {print (NR==1?"":RS)$0;next} {printf "%s",$0}' file
>1
AAAAAABBBCCCCCCCCDDDDFFF
>2
AAAAACCC
Upvotes: 3
Reputation: 562691
You can loop from 2 to NF
which is a built-in variable for the number of fields.
Print them with printf() to avoid outputting a newline. Then printf() one newline at the end of the record.
awk 'BEGIN { FS = "\n"; RS = ">" }
{ print ">"$1 }
{ for(i=2; i<NF; ++i) printf($(i)); }
{ printf("\n"); }' file
Upvotes: 2