Reputation: 43
I have thousands of files, which are a list of sequence names followed by their sequence, one individual per line, something like this:
L.abdalai.LJAMM.14363.SanMartindeLosAndes CCCTAAGAATAATTTGTT
L.carlosgarini.LJAMM.14070.LagunadelMaule CCCTAAGAAT-ATTTGTT
L.cf.silvai.DD.038.Sarco CCCTAAGAAT-ATTTGTT
And I want to change them to fasta format, so looking something like:
>L.abdalai.LJAMM.14363.SanMartindeLosAndes
CCCTAAGAATAATTTGTTCAGAAAAGATATTTAATTATAT
>L.carlosgarini.LJAMM.14070.LagunadelMaule
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
>L.cf.silvai.DD.038.Sarco
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
I work on a Mac.
Thanks!
Upvotes: 2
Views: 1256
Reputation: 5252
I believe you simplied your sample input, thus different from your expected output.
If not so, and my solutions not work, please comment under my answer to let me know.
So with awk, you can do it like this:
awk -v OFS="\n" '$1=">" $1' file
>L.abdalai.LJAMM.14363.SanMartindeLosAndes
CCCTAAGAATAATTTGTT
>L.carlosgarini.LJAMM.14070.LagunadelMaule
CCCTAAGAAT-ATTTGTT
>L.cf.silvai.DD.038.Sarco
CCCTAAGAAT-ATTTGTT
If you want to change inplace, please install GNU gawk, and use gawk -i inplace ....
And if you want the line endings be Carriages, add/change to -v ORS="\r" -v OFS="\r"
However, you can also, and maybe it's better to do it with sed
:
sed -e 's/\([^[:space:]]*\)[[:space:]]*\([^[:space:]]*\)/>\1\n\2/' file
Add -i''
like this: sed -i'' -e ...
to change file inplace.
Upvotes: 2
Reputation: 8711
Using Perl
perl -pe 's/^/</;s/(\S+)\s+(\S+)/$1\n$2CAGAAAAGATATTTAATTATAT/g ' file
with your inputs
$ cat damien.txt
L.abdalai.LJAMM.14363.SanMartindeLosAndes CCCTAAGAATAATTTGTT
L.carlosgarini.LJAMM.14070.LagunadelMaule CCCTAAGAAT-ATTTGTT
L.cf.silvai.DD.038.Sarco CCCTAAGAAT-ATTTGTT
$ perl -pe 's/^/</;s/(\S+)\s+(\S+)/$1\n$2CAGAAAAGATATTTAATTATAT/g ' damien.txt
<L.abdalai.LJAMM.14363.SanMartindeLosAndes
CCCTAAGAATAATTTGTTCAGAAAAGATATTTAATTATAT
<L.carlosgarini.LJAMM.14070.LagunadelMaule
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
<L.cf.silvai.DD.038.Sarco
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
$
Upvotes: 2
Reputation: 133518
Could you please try following(created and tested based on your samples, since I don't have mac to didn't test on it).
awk '/^L\./{print ">"$1 ORS $2 "CAGAAAAGATATTTAATTATAT"}' Input_file
Output will be as follows. If needed you could take it to a output_file by appending > output_file
to above command too.
>L.abdalai.LJAMM.14363.SanMartindeLosAndes
CCCTAAGAATAATTTGTTCAGAAAAGATATTTAATTATAT
>L.carlosgarini.LJAMM.14070.LagunadelMaule
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
>L.cf.silvai.DD.038.Sarco
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
Upvotes: 1