JareBear
JareBear

Reputation: 487

Python Write lines of a text in between a range of numbers to a new file

Sample Text File:

1. some text here
2. more text here
more text here
more text here
more text here
3. more text here
more text here
more text here
more text here
4. more text here
more text here
more text here
more text here
5. more text here
more text here
more text here
more text here
6. last text here
more text here
more text here
more text here

1. new text here
more text here
more text here
2. some more text
more text here
3. a bit more text
more text here
4. ok this is enough text.

1. nawww heres a bit more text.
more text here
more text here
2. okay this is the final text.
more text here
more text here
3. just to be sure this is last.
more text here
1. etc

This is a sample text from what I have, but this is a lot shorter.

I have this python code as a start:

with open("text.txt") as txt_file:
    lines = txt_file.readlines()
    for line in lines:
        if line.startswith('1.'):
            print(line)

But I am stuck with the fact that I have no idea how to print all the lines after the 1., to the next 1. into a separate file

I'm assuming that I'd have to have some sort of for loop in the last if statement I have there, but i'm not sure how to go about doing that.

For an example of what I expect my results to be is this:

If a line starts with 1.. Write the text and after that into a new text file until the next line that starts with 1., then start the whole process over again until there is no more text. So I for the sample text above I should have 4 files.

In this case file number 1. would have all text from the paragraphs from 1-6.

1. some text here
2. more text here
more text here
more text here
more text here
3. more text here
more text here
more text here
more text here
4. more text here
more text here
more text here
more text here
5. more text here
more text here
more text here
more text here
6. last text here
more text here
more text here
more text here

File number 2. would have all the text from the second 1. in the sample text file from all paragraphs 1-4

1. new text here
more text here
more text here
2. some more text
more text here
3. a bit more text
more text here
4. ok this is enough text.

File number 3. would have all the text from the third 1. in the sample text file from all paragraphs from 1-3

1. nawww heres a bit more text.
more text here
more text here
2. okay this is the final text.
more text here
more text here
3. just to be sure this is last.
more text here

And so one...

I hope i'm explaining this right and in a way that makes sense.

Upvotes: 1

Views: 2253

Answers (4)

Tim Pietzcker
Tim Pietzcker

Reputation: 336428

One simple approach would be to split the file at each line that starts with 1.:

import re
with open("text.txt") as txt_file:
    content = txt_file.read()
    chunks = []
    for match in re.split(r"(?=^1\.)", content, flags=re.MULTILINE):
        if match:
            chunks.append(match)

Now you have a list of texts each starting with 1. that you can iterate over and save to individual files.

Upvotes: 4

Jon Behnken
Jon Behnken

Reputation: 560

Here's another solution. You can tweak this as you see fit, but I found the index of all lines that contained 1. then just wrote the lines in between those indexes to new files.

with open('test.txt') as f:
    lines = f.readlines()
    ones_index = []
    for idx, line in enumerate(lines):
        if '1.' in line:
            ones_index.append(idx)

    ones_index[len(lines):] = [len(lines)]

    for i in range(len(ones_index)-1):
        start = ones_index[i]
        stop = ones_index[i+1]
        with open('newfile-{}.txt'.format(i), 'w') as g:
            g.write('\n'.join(lines[start:stop]))

Edit: I just realized this didn't handle the very last range of lines at first. Added a new line to fix this.

Upvotes: 1

Mark
Mark

Reputation: 92461

If you wanted to avoid reading the whole file into memory, you could make a generator that collects groups as they come from the file line-by-line and yield them when you have a complete group. Something like:

def splitgroups(text):
    lines = None
    for line in text:
        if line.startswith("1."):
            if lines is not None:
                yield lines
            lines = line
        else:
            lines += line
    yield lines

with open(filepath) as text:
    # iterate over groups rather than lines
    # and do what you want with each chunk:
    for group in splitgroups(text):
        print("*********")
        print(group)

Upvotes: 0

Barış Can Tayiz
Barış Can Tayiz

Reputation: 87

you create a variable n = 0

n = 0
for i in range(k):  

   while(n == i):
       print(line)
       if line.startswith(str(k)+"."):
           n += 1

if you want you can create a dic that you can save your lines as 1.line = [] as lists. then you can create a csv file with pandas library. I hope this helps if I understand correctly.

Upvotes: 0

Related Questions