GiorgosPap31
GiorgosPap31

Reputation: 43

python txt file to multiple txt files

I have a single txt file which contains multiple data samples in the form

ID-001

some data

ID-002

some other data

ID-003

some more data

and so on. Everything separated with an ID-i is a new data set, the IDs are unique, and the file is about 2000 lines.

I want to create a python script that will open the first file and create MULTIPLE txt files which will contain everything between ID-(i-1) to ID-i no matter how many different data samples there are in the file.

Any ideas?

Upvotes: 0

Views: 70

Answers (1)

dawg
dawg

Reputation: 103754

You could use a regex like so:

import re
pat=r'^(ID-\d+)$([\s\S]+?)(?=(?:^ID-\d+)|\Z)'
with open(ur_file) as f:
    for m in re.finditer(pat, f.read(), flags=re.M):
        print(f'{m.group(2)}' )

Prints:

some data



some other data



some more data

m.group(1) will have the ID-xxx and you could use that to write each block into a file.

Or, you can split the block into data blocks like so:

import re
with open(ur_file) as f:
    print([b for b in re.split(r'^ID-\d+', f.read(), flags=re.M) if b])

Prints:

['\nsome data\n\n', '\nsome other data\n\n', '\nsome more data\n']

Or you can use re.findall like so:

import re

pat=r'^(ID-\d+)$([\s\S]+?)(?=(?:^ID-\d+)|\Z)'
with open(ur_file) as f:
    print(re.findall(pat, f.read(), flags=re.M))

Prints:

[('ID-001', '\nsome data\n\n'), ('ID-002', '\nsome other data\n\n'), ('ID-003', '\nsome more data\n')]

Again, you can use that tuple data to write into separate files.

Upvotes: 1

Related Questions