JM88
JM88

Reputation: 477

Edit data removing line breaks and putting everything in a row

Hi I'm new in shell scripting and I have been unable to do this:

My data looks like this (much bigger actually):

 >SampleName_ZN189A
 01000001000000000000100011100000000111000000001000
 00110000100000000000010000000000001100000010000000
 00110000000000001110000010010011111000000100010000
 00000110000001000000010100000000010000001000001110
 0011
 >SampleName_ZN189B
 00110000001101000001011100000000000000000000010001
 00010000000000000010010000000000100100000001000000
 00000000000000000000000010000000000010111010000000
 01000110000000110000001010010000001111110101000000
 1100

Note: After every 50 characters there is a line break, but sometimes less when the data finishes and there's a new sample name

I would like that after every 50 characters, the line break would be removed, so my data would look like this:

 >SampleName_ZN189A
 0100000100000000000010001110000000011100000000100000110000100000000000010000000000001100000010000000...
 >SampleName_ZN189B
 0011000000110100000101110000000000000000000001000100010000000000000010010000000000100100000001000000...

I tried using tr but I got an error:

tr '\n' '' < my_file

tr: empty string2

Thanks in advance

Upvotes: 1

Views: 209

Answers (6)

potong
potong

Reputation: 58391

This might work for you (GNU sed):

sed '/^\s*>/!{H;$!d};x;s/\n\s*//2gp;x;h;d' file

Build up the record in the hold space and when encountering the start of the next record or the end-of-file remove the newlines and print out.

Upvotes: 1

BMW
BMW

Reputation: 45243

Using awk

awk '/>/{print (NR==1)?$0:RS $0;next}{printf $0}' file

if you don't care of the result which has additional new line on first line, here is shorter one

awk '{printf (/>/?RS $0 RS:$0)}' file

Upvotes: 1

Fidel
Fidel

Reputation: 1037

Try this

cat SampleName_ZN189A | tr -d '\r'
# tr -d deletes the given/specified character from the input

Using simple awk, Same will be achievable.

 awk 'BEGIN{ORS=""} {print}' SampleName_ZN189A #Output doesn't contains an carriage return
 at the end, If u want an line break at the end this works.

 awk 'BEGIN{ORS=""} {print}END{print "\r"}' SampleName_ZN189A
 # select the correct line break charachter (i.e) \r (or) \n (\r\n) depends upon the file format.

Upvotes: 0

anubhava
anubhava

Reputation: 785108

You can use this awk:

awk '/^ *>/{if (s) print s; print; s="";next} {s=s $0;next} END {print s}' file

>SampleName_ZN189A
010000010000000000001000111000000001110000000010000011000010000000000001000000000000110000001000000000110000000000001110000010010011111000000100010000000001100000010000000101000000000100000010000011100011
>SampleName_ZN189B
001100000011010000010111000000000000000000000100010001000000000000001001000000000010010000000100000000000000000000000000000010000000000010111010000000010001100000001100000010100100000011111101010000001100

Upvotes: 1

sat
sat

Reputation: 14949

you can use this sed,

sed '/^>Sample/!{ :loop; N; /\n>Sample/{n}; s/\n//; b loop; }' file.txt

Upvotes: 0

Varun
Varun

Reputation: 691

tr with "-d" deletes specified character

$ cat input.txt
00110000001101000001011100000000000000000000010001
00010000000000000010010000000000100100000001000000
00000000000000000000000010000000000010111010000000
01000110000000110000001010010000001111110101000000
1100
$ cat input.txt | tr -d "\n"
001100000011010000010111000000000000000000000100010001000000000000001001000000000010010000000100000000000000000000000000000010000000000010111010000000010001100000001100000010100100000011111101010000001100

Upvotes: 2

Related Questions