Reputation: 949
I have a plain text file with the following contents:
@M00964: XXXXX
YYY
+
ZZZZ
@M00964: XXXXX
YYY
+
ZZZZ
@M00964: XXXXX
YYY
+
ZZZZ
and I would like to read this into a list split into items according to the ID code @M00964
, i.e. :
['@M00964: XXXXX
YYY
+
ZZZZ'
'@M00964: XXXXX
YYY
+
ZZZZ'
'@M00964: XXXXX
YYY
+
ZZZZ']
I have tried using
in_file = open(fileName,"r")
sequences = in_file.read().split('@M00964')[1:]
in_file.close()
but this removes the ID sequence @M00964
. Is there any way to keep this ID sequence in?
As an additional question is there any way of maintaining white space in a list (rather than have /n symbols).
My overall aim is to read in this set of items, take the first 2, for example, and write them back to a text file maintaining all of the original formatting.
Upvotes: 3
Views: 2729
Reputation: 32189
Specific to your example, can't you just do something as follows:
in_file = open(fileName, 'r')
file = in_file.readlines()
new_list = [''.join(file[i*4:(i+1)*4]) for i in range(int(len(file)/4))]
list_no_n = [item.replace('\n','') for item in new_list]
print new_list
print list_no_n
[EXPANDED FORM]
new_list = []
for i in range(int(len(file)/4)): #Iterates through 1/4 of the length of the file lines.
#This is because we will be dealing in groups of 4 lines
new_list.append(''.join(file[i*4:(i+1)*4])) #Joins four lines together into a string and adds it to the new_list
[Writing to new file]
write_list = ''.join(new_list).split('\n')
output_file = open(filename, 'w')
output_file.writelines(write_list)
Upvotes: 0
Reputation: 2960
Just split on the @ sign instead:
with open(fileName,"r") as in_file:
sequences = in_file.read().replace("@","###@").split('###')
Upvotes: 0
Reputation: 45542
If your file is large and you don't want hold the whole thing in memory you can just iterate over individual records using this helper function:
def chunk_records(filepath)
with open(filepath, 'r') as f:
record = []
for line in f:
# could use regex for more complicated matching
if line.startswith('@M00964') and record:
yield ''.join(record)
record = []
else:
record.append(line)
if record:
yield ''.join(record)
Use it like
for record in chunk_records('/your/filename.txt'):
...
Or if you want the whole thing in memory:
records = list(chunk_records('/your/filename.txt'))
Upvotes: 3