Reputation: 25
I have a .txt file with 5 sections of data. Each section has a header line "Section X". I would like to parse and write 5 separate files from this single file. The section would start at the header and end before the next section header. The code below create 5 separate files; however, they are all blank.
from itertools import cycle
filename = raw_input("Which file?: \n")
dimensionsList = ["Section 1", "Section 2",
"Section 3", "Section 4", "Section 5"]
with open(filename+".txt", "rb") as oldfile:
for i in dimensionsList:
licycle = cycle(dimensionsList)
nextelem = licycle.next()
with open(i+".txt", "w") as newfile:
for line in oldfile:
if line.strip() == i:
break
for line in oldfile:
if line.strip() == nextelem:
break
newfile.write(line)
Upvotes: 1
Views: 377
Reputation: 1875
Testing your code, it worked only for Section 1 (and the others were blank for me, too). I realized the problem is the transition between Sections (and also, the licycle
restarting at all iteractions).
The Section 2 is read at the second for
(if line.strip() == nextelem
). And the next line, is the data of Section 2 (and not the text Section 2
).
It is hard by words, but test the code below:
from itertools import cycle
filename = raw_input("Which file?: \n")
dimensionsList = ["Section 1", "Section 2", "Section 3", "Section 4",
"Section 5"]
with open(filename + ".txt", "rb") as oldfile:
licycle = cycle(dimensionsList)
nextelem = licycle.next()
for i in dimensionsList:
print(nextelem)
with open(i + ".txt", "w") as newfile:
for line in oldfile:
print("ignoring %s" % (line.strip()))
if line.strip() == i:
nextelem = licycle.next()
break
for line in oldfile:
if line.strip() == nextelem:
# nextelem = licycle.next()
print("ignoring %s" % (line.strip()))
break
print("printing %s" % (line.strip()))
newfile.write(line)
print('')
It will print:
Section 1
ignoring Section 1
printing aaaa
printing bbbb
ignoring Section 2
Section 2
ignoring ccc
ignoring ddd
ignoring Section 3
ignoring eee
ignoring fff
ignoring Section 4
ignoring ggg
ignoring hhh
ignoring Section 5
ignoring iii
ignoring jjj
Section 2
Section 2
Section 2
It worked for section 1, it detects section 2, but it keeps ignoring the lines because it does not find "Section 2".
If every time you restart the lines (always from line 1), I think the program would work. But I made a simpler code, that should work for you.
from itertools import cycle
filename = raw_input("Which file?: \n")
dimensionsList = ["Section 1", "Section 2", "Section 3", "Section 4",
"Section 5"]
with open(filename + ".txt", "rb") as oldfile:
licycle = cycle(dimensionsList)
nextelem = licycle.next()
newfile = None
line = oldfile.readline()
while line:
# Case 1: Found new section
if line.strip() == nextelem:
if newfile is not None:
newfile.close()
nextelem = licycle.next()
newfile = open(line.strip() + '.txt', 'w')
# Case 2: Print line to current section
elif newfile is not None:
newfile.write(line)
line = oldfile.readline()
If it finds the Section, it starts writing at this new file. Otherwise, keep writing at this current file.
Ps.: Below, the file as example that I used:
Section 1
aaaa
bbbb
Section 2
ccc
ddd
Section 3
eee
fff
Section 4
ggg
hhh
Section 5
iii
jjj
Upvotes: 1