Reputation: 64054
I have data that looks like this:
> sq1
foofoofoobar
foofoofoo
> sq2
quxquxquxbar
quxquxquxbar
quxx
> sq3
foofoofoobar
foofoofoo
> sq4
foofoofoobar
foofoo
I want to join the lines on the basis of ">sqi" header as cut-off line, i.e. yielding:
foofoofoobarfoofoofoo
quxquxquxbarquxquxquxbarquxx
foofoofoobarfoofoofoo
foofoofoobarfoofoo
I tried using this sed
but fail:
sed '/^S/d;N;s/\n/\t/'
What's the correct way to do it?
Upvotes: 0
Views: 2673
Reputation: 360445
You're testing for a capital "S" at the beginning of the line. You should be testing for the greater-than character:
sed '/^>/d;N;s/\n/\t/'
or
sed '/^> sq/d;N;s/\n/\t/'
Edit: I missed the fact that there are variable numbers of lines between the headers. This is what I have so far:
sed -n '/^>/{x; p; d}; /^>/!H; x; s/\n/\t/; h; $p'
Unfortunately, this leaves in the header:
> sq1 foofoofoobar foofoofoo
> sq2 quxquxquxbar quxquxquxbar quxx
> sq3 foofoofoobar foofoofoo
> sq4 foofoofoobar foofoo
If you do this from a Bash prompt, you may have to do set +H
first so you don't get history expansion interference because of the exclamation point.
Edit2: My revised version that gets rid of the headers:
sed -n '1{x;d};/^>/{x; p; d}; H; x; s/\n/\t/; s/^>.*\t//; h; $p'
Upvotes: 1
Reputation: 4827
#!/bin/sed -f
# If this is a header line, empty it...
s/^>.*//
# ... and then jump to the 'end' label.
t end
# Otherwise, append this data line to the hold space.
H
# If this is not the last line, continue to the next line.
$!d
# Otherwise, this is the end of the file or the start of a header.
: end
# Call up the data lines we last saw (putting the empty line in the hold).
x
# If we haven't seen any data lines recently, continue to the next line.
/^$/d
# Otherwise, strip the newlines and print.
s/\n//g
# The one-line version:
# sed -e 's/^>.*//;te' -e 'H;$!d;:e' -e 'x;/^$/d;s/\n//g'
Upvotes: 3
Reputation: 4675
I can't find a simple way to do it in sed. Anyway, with gawk/mawk you just have to change the RS variable and cut newline characters:
awk -v RS='> sq[0-9]' 'NR>1{gsub(/\n/,"");print}' file
Upvotes: 1
Reputation: 40849
A bash solution for the original question (ie. without "headers"):
#!/bin/bash
text=[]
i=0
exec <$1
while read line
do
text[$i]=$line
let "i += 1"
done
j=0
len=0
while [ $j -lt ${#text[@]} ]
do
string=${text[$j]}
if [ $len -le ${#string} ] ; then
printf $string
else
printf $string'\n'
fi
len=${#string}
let "j += 1"
done
printf '\n'
Upvotes: 1