Reputation: 477
Hi everyone my data looks like this
samplename 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 ...
samplename2 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 ...
and I want it to look like this:
>samplename
0 1 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 0 ...
>samplename2
0 0 0 0 0 1 1 1 1 1
1 1 1 1 1 1 0 0 0 ...
[note - showing a line break after every 10 digits; I actually want it after every 200, but I realize that showing a line like that would not be very helpful].
I could do it using regular expression on a text editor but I want to use the sed command in the bash because I have to do this several times and I need 200 characters per row.
I tried this but got an error:
sed -e "s/\(>\w+\)\s\([0-9]+\)/\1\n\2" < myfile > myfile2
sed: 1: "s/(>\w+)\s([0-9]+)/ ...": unescaped newline inside substitute pattern
One more note - I am doing this on a Mac; I know that sed
on the Mac is a little bit different from gnu sed
. If you are able to give me the solution that works for a Mac that would be great.
Thanks in advance.
Upvotes: 3
Views: 534
Reputation: 98088
fold
is your friend:
sed 's/\([^ ]*\) /\1\n/' input | fold -w 100
Upvotes: 1
Reputation: 204258
$ awk '{print ">" $1; for (i=2;i<=NF;i++) printf "%s%s", $i, ((i-1)%10 ? FS : RS)}' file
>samplename
0 1 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 0 ...
>samplename2
0 0 0 0 0 1 1 1 1 1
1 1 1 1 1 1 0 0 0 ...
Upvotes: 1
Reputation: 46435
With your added request for a line break after 200 numbers, you are much better off using awk
.
echo "hello 1 2 3 4" | awk '{print ">"$1; for(i=2; i<=NF; i++) {printf("%d ",$i); if((i+1)%2 == 0) printf("\n");}}
prints out
>hello
1 2
3 4
If you want this to work only on lines that start with hello
, you can modify as
echo "hello 1 2 3 4" | awk '/^hello / {print ">"$1; for(i=2; =NF; i++) {printf("%d ",$i); if((i+1)%2 == 0) printf("\n");}}
(the regular expression in the / /
says "only do this on lines that match this expression".
You can modify the statement if( (i + 1) % 2 == 0)
to be if( (i + 1) % 100 == 0 )
to get a newline after 100 digits... I just showed it for 2
because the printout is more readable.
update to make this all much cleaner, do the following.
Create a file call breakIt with the following contents: (leave out the /^hello /
if you don't want to select only lines starting with "hello"; but leave the {}
around the code, it matters).
/^hello/ { print ">"$1;
for(i=2; i<=NF; i++)
{
printf("%d ",$i);
if((i+1)%100 == 0) printf("\n");
}
print "";
}
Now you can issue the command
awk -f breakIt inputFile > outputFile
This says "use the contents of breakIt
as the commands to process inputFile
and put the results in outputFile
".
Should do the trick nicely for you.
edit just in case you really do want a sed
solution, here is a nice one (well I think so). Copy the following into a file called sedSplit
s/^([A-Za-z]+ )/>\1\
/g
s/([0-9 ]{10})/\1\
/g
s/$/\
/g
This has three consecutive sed
commands; these are each on their own line, but since they insert newlines, they actually appear to take six lines.
s/^ - substitute, starting from the beginning of the line
([A-Za-z]+ )/ - substitute the first word (letters only) plus space, replacing with
>\1\
/g - the literal '>', then the first match, then a newline, as often as needed (g)
s/([0-9] ]{10})/ - substitute 10 repetitions of [digit followed by space]
\1\
/g - replace with itself, followed by newline, as often as needed
s/$/\
/g - replace the 'end of line' with a carriage return
You invoke this sed script like this:
sed -E -f sedSplit < inputFile > outputFile
This uses the
-E
flag (use extended regular expressions - no need for escaping brackets and such)
-f
flag ('get instructions from this file')
It makes the whole thing much cleaner - and gives you the output you asked for on a Mac (even with an extra carriage return to separate the groups; if you don't want that, leave out the last two lines).
Upvotes: 1
Reputation: 923
In double quotes the backslash is interpreted by the shell. Either one of these should work.
sed -e 's/\(>\w+\)\s\([0-9]+\)/\1\n\2/' < myfile > myfile2
sed -e "s/\\(>\\w+\\)\\s\\([0-9]+\\)/\\1\\n\\2/" < myfile > myfile2
PS, I added the terminating slash. You had a s/.../... instead of s/.../.../
PS, as I'm looking at your regexp, sed will complain no end. Try this.
sed -e 's/^\(\w\+\)\s\+/>\1\n/' < myfile > myfile2
MAC version, with 200 character limit (100 single digits and 100 spaces)
sed -Ee 's/^([a-zA-Z0-9]+) />\1\
/' | sed -Ee 's/(([0-9] ){99}[0-9]) /\1\
/g' < myfile > myfile2
First sed separates the character string from the number, the second splits the lines.
Upvotes: 0
Reputation: 247052
plain bash:
while read -r name values; do
printf ">%s\n%s\n" "$name" "$values"
done <<END
samplename 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 ...
samplename2 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 ...
END
>samplename
0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 ...
>samplename2
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 ...
assuming the samplename does not contain whitespace
Upvotes: 0