Reputation: 487
Sample Text File:
1. some text here
2. more text here
more text here
more text here
more text here
3. more text here
more text here
more text here
more text here
4. more text here
more text here
more text here
more text here
5. more text here
more text here
more text here
more text here
6. last text here
more text here
more text here
more text here
1. new text here
more text here
more text here
2. some more text
more text here
3. a bit more text
more text here
4. ok this is enough text.
1. nawww heres a bit more text.
more text here
more text here
2. okay this is the final text.
more text here
more text here
3. just to be sure this is last.
more text here
1. etc
This is a sample text from what I have, but this is a lot shorter.
I have this python code as a start:
with open("text.txt") as txt_file:
lines = txt_file.readlines()
for line in lines:
if line.startswith('1.'):
print(line)
But I am stuck with the fact that I have no idea how to print all the lines after the 1.
, to the next 1.
into a separate file
I'm assuming that I'd have to have some sort of for
loop in the last if
statement I have there, but i'm not sure how to go about doing that.
For an example of what I expect my results to be is this:
If a line starts with 1.
. Write the text and after that into a new text file until the next line that starts with 1.
, then start the whole process over again until there is no more text.
So I for the sample text above I should have 4
files.
In this case file number 1.
would have all text from the paragraphs from 1-6
.
1. some text here
2. more text here
more text here
more text here
more text here
3. more text here
more text here
more text here
more text here
4. more text here
more text here
more text here
more text here
5. more text here
more text here
more text here
more text here
6. last text here
more text here
more text here
more text here
File number 2.
would have all the text from the second 1.
in the sample text file from all paragraphs 1-4
1. new text here
more text here
more text here
2. some more text
more text here
3. a bit more text
more text here
4. ok this is enough text.
File number 3.
would have all the text from the third 1.
in the sample text file from all paragraphs from 1-3
1. nawww heres a bit more text.
more text here
more text here
2. okay this is the final text.
more text here
more text here
3. just to be sure this is last.
more text here
And so one...
I hope i'm explaining this right and in a way that makes sense.
Upvotes: 1
Views: 2253
Reputation: 336428
One simple approach would be to split the file at each line that starts with 1.
:
import re
with open("text.txt") as txt_file:
content = txt_file.read()
chunks = []
for match in re.split(r"(?=^1\.)", content, flags=re.MULTILINE):
if match:
chunks.append(match)
Now you have a list of texts each starting with 1.
that you can iterate over and save to individual files.
Upvotes: 4
Reputation: 560
Here's another solution. You can tweak this as you see fit, but I found the index of all lines that contained 1.
then just wrote the lines in between those indexes to new files.
with open('test.txt') as f:
lines = f.readlines()
ones_index = []
for idx, line in enumerate(lines):
if '1.' in line:
ones_index.append(idx)
ones_index[len(lines):] = [len(lines)]
for i in range(len(ones_index)-1):
start = ones_index[i]
stop = ones_index[i+1]
with open('newfile-{}.txt'.format(i), 'w') as g:
g.write('\n'.join(lines[start:stop]))
Edit: I just realized this didn't handle the very last range of lines at first. Added a new line to fix this.
Upvotes: 1
Reputation: 92461
If you wanted to avoid reading the whole file into memory, you could make a generator that collects groups as they come from the file line-by-line and yield them when you have a complete group. Something like:
def splitgroups(text):
lines = None
for line in text:
if line.startswith("1."):
if lines is not None:
yield lines
lines = line
else:
lines += line
yield lines
with open(filepath) as text:
# iterate over groups rather than lines
# and do what you want with each chunk:
for group in splitgroups(text):
print("*********")
print(group)
Upvotes: 0
Reputation: 87
you create a variable n = 0
n = 0
for i in range(k):
while(n == i):
print(line)
if line.startswith(str(k)+"."):
n += 1
if you want you can create a dic that you can save your lines as 1.line = [] as lists. then you can create a csv file with pandas library. I hope this helps if I understand correctly.
Upvotes: 0