Reputation: 43
I have a single txt file which contains multiple data samples in the form
ID-001some data
ID-002some other data
ID-003some more data
and so on. Everything separated with an ID-i is a new data set, the IDs are unique, and the file is about 2000 lines.
I want to create a python script that will open the first file and create MULTIPLE txt files which will contain everything between ID-(i-1) to ID-i no matter how many different data samples there are in the file.
Any ideas?
Upvotes: 0
Views: 70
Reputation: 103754
You could use a regex like so:
import re
pat=r'^(ID-\d+)$([\s\S]+?)(?=(?:^ID-\d+)|\Z)'
with open(ur_file) as f:
for m in re.finditer(pat, f.read(), flags=re.M):
print(f'{m.group(2)}' )
Prints:
some data
some other data
some more data
m.group(1)
will have the ID-xxx
and you could use that to write each block into a file.
Or, you can split the block into data blocks like so:
import re
with open(ur_file) as f:
print([b for b in re.split(r'^ID-\d+', f.read(), flags=re.M) if b])
Prints:
['\nsome data\n\n', '\nsome other data\n\n', '\nsome more data\n']
Or you can use re.findall
like so:
import re
pat=r'^(ID-\d+)$([\s\S]+?)(?=(?:^ID-\d+)|\Z)'
with open(ur_file) as f:
print(re.findall(pat, f.read(), flags=re.M))
Prints:
[('ID-001', '\nsome data\n\n'), ('ID-002', '\nsome other data\n\n'), ('ID-003', '\nsome more data\n')]
Again, you can use that tuple data to write into separate files.
Upvotes: 1