neversaint
neversaint

Reputation: 64054

Joining Line Breaks With Condition in SED

I have data that looks like this:

> sq1
foofoofoobar
foofoofoo
> sq2
quxquxquxbar
quxquxquxbar
quxx
> sq3
foofoofoobar
foofoofoo
> sq4
foofoofoobar
foofoo

I want to join the lines on the basis of ">sqi" header as cut-off line, i.e. yielding:

foofoofoobarfoofoofoo
quxquxquxbarquxquxquxbarquxx
foofoofoobarfoofoofoo
foofoofoobarfoofoo

I tried using this sed but fail:

sed '/^S/d;N;s/\n/\t/' 

What's the correct way to do it?

Upvotes: 0

Views: 2673

Answers (4)

Dennis Williamson
Dennis Williamson

Reputation: 360445

You're testing for a capital "S" at the beginning of the line. You should be testing for the greater-than character:

sed '/^>/d;N;s/\n/\t/'

or

sed '/^> sq/d;N;s/\n/\t/'

Edit: I missed the fact that there are variable numbers of lines between the headers. This is what I have so far:

sed  -n '/^>/{x; p; d}; /^>/!H; x; s/\n/\t/; h; $p'

Unfortunately, this leaves in the header:

> sq1    foofoofoobar    foofoofoo
> sq2    quxquxquxbar    quxquxquxbar    quxx
> sq3    foofoofoobar    foofoofoo
> sq4    foofoofoobar    foofoo

If you do this from a Bash prompt, you may have to do set +H first so you don't get history expansion interference because of the exclamation point.

Edit2: My revised version that gets rid of the headers:

sed  -n '1{x;d};/^>/{x; p; d}; H; x; s/\n/\t/; s/^>.*\t//; h; $p'

Upvotes: 1

Mark Edgar
Mark Edgar

Reputation: 4827

#!/bin/sed -f

# If this is a header line, empty it...
s/^>.*//
# ... and then jump to the 'end' label.
t end
# Otherwise, append this data line to the hold space.
H
# If this is not the last line, continue to the next line.
$!d
# Otherwise, this is the end of the file or the start of a header.
: end
# Call up the data lines we last saw (putting the empty line in the hold).
x
# If we haven't seen any data lines recently, continue to the next line.
/^$/d
# Otherwise, strip the newlines and print.
s/\n//g

# The one-line version:
# sed -e 's/^>.*//;te' -e 'H;$!d;:e' -e 'x;/^$/d;s/\n//g'

Upvotes: 3

marco
marco

Reputation: 4675

I can't find a simple way to do it in sed. Anyway, with gawk/mawk you just have to change the RS variable and cut newline characters:

awk -v RS='> sq[0-9]' 'NR>1{gsub(/\n/,"");print}' file

Upvotes: 1

Michael F
Michael F

Reputation: 40849

A bash solution for the original question (ie. without "headers"):

#!/bin/bash
text=[]
i=0

exec <$1

while read line
do
    text[$i]=$line
    let "i += 1"
done


j=0
len=0
while [ $j -lt ${#text[@]} ]
do
    string=${text[$j]}
    if [ $len -le ${#string} ] ; then
        printf $string
    else
        printf $string'\n'
    fi
    len=${#string}
    let "j += 1"
done
printf '\n'

Upvotes: 1

Related Questions