K.Jolly
K.Jolly

Reputation: 25

Python - Writing Separate Files per Section of a Single File

I have a .txt file with 5 sections of data. Each section has a header line "Section X". I would like to parse and write 5 separate files from this single file. The section would start at the header and end before the next section header. The code below create 5 separate files; however, they are all blank.

from itertools import cycle

filename = raw_input("Which file?: \n")

dimensionsList = ["Section 1", "Section 2",
    "Section 3", "Section 4", "Section 5"]

with open(filename+".txt", "rb") as oldfile:
    for i in dimensionsList:
        licycle = cycle(dimensionsList)
        nextelem = licycle.next()
        with open(i+".txt", "w") as newfile: 
            for line in oldfile:
                if line.strip() == i:
                    break
            for line in oldfile:
                if line.strip() == nextelem:
                    break
                newfile.write(line)

Upvotes: 1

Views: 377

Answers (1)

Rafael
Rafael

Reputation: 1875

Problem

Testing your code, it worked only for Section 1 (and the others were blank for me, too). I realized the problem is the transition between Sections (and also, the licycle restarting at all iteractions).

The Section 2 is read at the second for (if line.strip() == nextelem). And the next line, is the data of Section 2 (and not the text Section 2).

It is hard by words, but test the code below:

from itertools import cycle

filename = raw_input("Which file?: \n")

dimensionsList = ["Section 1", "Section 2", "Section 3", "Section 4",
                  "Section 5"]

with open(filename + ".txt", "rb") as oldfile:
    licycle = cycle(dimensionsList)
    nextelem = licycle.next()
    for i in dimensionsList:
        print(nextelem)
        with open(i + ".txt", "w") as newfile:
            for line in oldfile:
                print("ignoring %s" % (line.strip()))
                if line.strip() == i:
                    nextelem = licycle.next()
                    break
            for line in oldfile:
                if line.strip() == nextelem:
                    # nextelem = licycle.next()
                    print("ignoring %s" % (line.strip()))
                    break
                print("printing %s" % (line.strip()))
                newfile.write(line)
            print('')

It will print:

Section 1
ignoring Section 1
printing aaaa
printing bbbb
ignoring Section 2

Section 2
ignoring ccc
ignoring ddd
ignoring Section 3
ignoring eee
ignoring fff
ignoring Section 4
ignoring ggg
ignoring hhh
ignoring Section 5
ignoring iii
ignoring jjj

Section 2

Section 2

Section 2

It worked for section 1, it detects section 2, but it keeps ignoring the lines because it does not find "Section 2".

If every time you restart the lines (always from line 1), I think the program would work. But I made a simpler code, that should work for you.

Solution

from itertools import cycle

filename = raw_input("Which file?: \n")

dimensionsList = ["Section 1", "Section 2", "Section 3", "Section 4",
                  "Section 5"]

with open(filename + ".txt", "rb") as oldfile:

    licycle = cycle(dimensionsList)
    nextelem = licycle.next()
    newfile = None
    line = oldfile.readline()

    while line:

        # Case 1: Found new section
        if line.strip() == nextelem:
            if newfile is not None:
                newfile.close()
            nextelem = licycle.next()
            newfile = open(line.strip() + '.txt', 'w')

        # Case 2: Print line to current section
        elif newfile is not None:
            newfile.write(line)

        line = oldfile.readline()

If it finds the Section, it starts writing at this new file. Otherwise, keep writing at this current file.

Ps.: Below, the file as example that I used:

Section 1
aaaa
bbbb
Section 2
ccc
ddd
Section 3
eee
fff
Section 4
ggg
hhh
Section 5
iii
jjj

Upvotes: 1

Related Questions