Reputation: 55
I have a file which I want to split it into many other parts. i want to use python code...
Eg: the data in my file is like this
>2165320 21411 200802 8894-,...,765644-
TTCGGAGCTTACTAATTTTAAATATGAAGAATGCCAATATAAGTTTTGATTTCGAAAATACTTTTTTACTAGTTAAAAATTCATGATTTTCTACATCTATAACAATTTGTGTTTTTTTTAAACATCTTCCAGTGTCCTAAGTGTATATTTTTTAACGCAATGTTTGAATACTTTTAGGGTTTACCTTATTTAATTTGATTTTTAATGTGAGTTGTAATCACTGGTGAGCATACTGTTTTTCTTTTGTTCAGTAATATTGCATTTGTAGCTTTTGTATTGCTTAGATATATCACATTAAATCCTTTGTTCAGAAACCCATCCGACAGGGAGTCATAGGTGCCACACTAGTGGTCGAGGATCTAGGATGTCGGAAGGTCAACAATGGGGTAAAACACTAATTTTTTAATTTCTTGTATTTACCAAATTTACTGATTTTGCATTTAGTAGATGGTATATATACTCTTCTACCTTGTACAGTTGATGGTACCTGACTAAATATGTTTTATTTCCTTCTCCAGGATCTTTATGTAGTACGATTCTACAGTCGTCAAGAGGAGGGTAGAAAAGGAGAAGTAAGTTATAATATTTCTGAGCTTTTTTCTTTTTAATTGTTGTTGATAGAAAGTTGTGCCATATACATGTTTTAAGGTGGTGTA
>2165799 14641 135356 16580+,...,680341-
AAGGTAGGAGGTACTCGTGCTAATGGAGGAGCTAATGGTACACCAAACCGACGGCTGTCACTTAATGCTCATCAAAACGGAAGCAGGTCCACAACAAAAGATGGAAAAAAAGACATCAGACCAGTTGCTCCTGTGAATTATGTGGCCATATCAAAAGAAGATGCTGCTTCCCATGTTTCTGGTACCGAACCAATCCCGGCATCACCCTAATAATGAGATCTTCATTATCAACCCTACAATTTCATCTTTGTAGCATGATCAAATACTAGTTACTGCTTTAGGAATTATAATATGGAGTGACAAGTAATTAGAGAGGAACTGTTTTGAGCTGTGTATGTTCAATTTGCCATTTGGAGGTTTTCTCAATACATGTGCCCTTTAATATGAAAATATAGTGCTATTCTTGCCTTTCTCCAAACCCTGGCTCCTCCTATTCATCGGTTTCTT
>2169677 23891 1928391 1298391,…..,739483-
CTAGCTGATCGAGCTGATCGTAGTGAGCTATCGAGCTGACTACTAGCTAGTCGTGATAGCTGATCGAGCTGACTGATGTGCTAGTAGTAGTTTCATGATTTTCTACATCTATAACAATTTGTGTTTTTTTTAAACATCTTCCAGTGTCCTAAGTGTATATTTTTTAACGCAATGTTTGAATACTTTTAGGGTTTACCTTATTTAATTTGATTTTTAATGTGAGTTGTAATCACTGGTGAGCATACTGTTTTTCTTTTGTTCAGTAATATTGCATTTGTAGCTTTTGTATTGCTTAGATATATCACATTAAATCCTTTGTTCAGAAACCCATCCGACAGGGAGTCATAGGTGCCACACTAGTGGTCGAGGATCTAGGATGTCGGAAGGTCAACAATGGGGTAAAACACTAATTTTTTAATTTCTTGTATTTACCAAATTTACTGATTTTGCATTTAGTAGATGGTATATATACTCTTCTACCTTGTACAGTTGATGGTACCTGACTAAATATGTTTTATTTCCTTCTCCAGGATCTTTATGTAGTACGATTCTACAGTCGTCAAGAGGAGGGTAGAAAAGGAGAAGTAAGTTATAATATTTCTGAGCTTTTTTCTTTTTAATTGTTGTTGATAGAAAGTTGTGCCATATACATGTTTTA
And so on.
So now I want to split the file from ’>’ sing to next one n store this in a separate file.
Like 1st file will have
>2165320 21411 200802 8894-,...,765644-
TTCG…..GTA
data.
2nd file will have
>2165799 14641 135356 16580+,...,680341-
AAGG….GTTTCTT
data and so on.
Upvotes: 2
Views: 192
Reputation: 92559
It seems your data is just newline separated, so all you would need to do is loop over the lines and write the non-blank ones to incrementing files:
with open("source.txt") as f:
counter = 1
for line in f:
if not line.strip():
continue
with open("out_%03d.txt" % counter, 'w') as out:
out.write(line)
counter += 1
This will assume that each group is really one long line (it wasn't clear to me the real format).
Because you haven't given us much explanation about the real format of this file, here is another option in case those really are newline characters between lines that should be in the same file. If "@" is a solid indicator of a new group, we can just use it to indicate a new file:
with open("source.txt") as f:
counter = 1
out = None
for line in f:
if line.lstrip().startswith("@"):
if out is not None:
out.close()
out_name = "out_%03d.txt" % counter
counter += 1
out = open(out_name, 'w')
out.write(line)
if out is not None:
out.close()
Upvotes: 1
Reputation: 27
with open("source.txt") as f:
counter = 1
for line in f:
if counter % 3 == 0:
continue
with open("out_%03d.txt" % counter, 'w') as out:
out.write(line)
counter += 1
Upvotes: 0
Reputation: 414129
To write each blank-line-separated group of lines to a separate file you could use itertools.groupby()
:
#!/usr/bin/env python
import sys
from itertools import groupby
def blank(line, mark=[0]):
if not line.strip(): # blank line
mark[0] ^= 1 # mark the start of new group
return mark[0]
for i, (_, group) in enumerate(groupby(sys.stdin, blank), start=1):
with open("group-%03d.txt" % (i,), "w") as outfile:
outfile.writelines(group)
Usage:
$ python split-on-blank.py < input_file.txt
If you work with such formats often; consider using a proper parser such as provided by Bio.SeqIO.parse()
function from biopython.
Upvotes: 1