zingy
zingy

Reputation: 811

changing the contents of a file applying different conditions

I am trying to do some changes in the contents of an input file. The input file I have looks like the following:

18800000 20400000 pau
20400000 21300000 aa
21300000 22500000 p
22500000 23200000 l
23200000 24000000 ay
24000000 25000000 k
25000000 26500000 pau

This file is a transcription of an audio file. The first number denotes the start time and the next one denotes the end time. Then the alphabets denote the sound.

The change I have to make is, there are a few sounds which is made of two different sounds ie there are some diphthongs too. So these diphthongs have to be split into the two sounds. In the example above the diphthong is 'ay'. It is made of 'ao' and 'ih'. What happens here is, the duration of 'ay' which is 24000000 - 232000000 = 8 is distributed into these two sounds. The result will be,

23200000 24000000 ay

changes to

23200000 236000000 ao
23600000 240000000 ih

I have attempted to write a pseudo code which looks rubbish.

def test(transcriptionFile) :
    with open("transcriptions.txt", "r+") as tFile :
        for line in tFile :
            if 3rd_item = ay
                duration = (2nd_item[1] - 1st_item[2]) / 2
                delete the line
                tFile.write(1st_item, 1st_item + d, ao)
                tfile.write(1st_item + d, 1st_item, ih) # next line

if__name__ == "__main__" :
    test("transcriptions.txt")  

Thank you.

With the suggestions I was given I changed the code to the following. It is still not correct.

def test(transcriptionFile) :
    with open("transcriptions.txt", "r") as tFile :
        inp = tFile.readlines()

    outp = []
    for ln in inp :
        start, end, sound = ln.strip()
        if sound == ay :
            duration = (end - start) / 2
            ln.delete
            start = start  
            end = start + duration
            sound = ao
            outp.append(ln)
            start = start + duration # next line 
            end = start
            sound = ih 
            outp.append(ln)

    with open("transcriptions.txt", "w") as tFile:
        tFile.writelines(outp)

__name__ == "__main__"
test("transcriptions.txt")     

Upvotes: 1

Views: 93

Answers (2)

ekhumoro
ekhumoro

Reputation: 120608

The following script should do what you want:

import sys

def main(src, dest):
    with open(dest, 'w') as output:
        with open(src) as source:
            for line in source:
                try:
                    start, end, sound = line.split()
                except ValueError:
                    continue
                if sound == 'ay':
                    start = int(start)
                    end = int(end)
                    offset = (end - start) // 2
                    output.write('%s %s ao\n' % (start, start + offset))
                    output.write('%s %s ih\n' % (start + offset, end))
                else:
                    output.write(line)

if __name__ == "__main__":

    main(*sys.argv[1:])

Output:

18800000 20400000 pau
20400000 21300000 aa
21300000 22500000 p
22500000 23200000 l
23200000 23600000 ao
23600000 24000000 ih
24000000 25000000 k
25000000 26500000 pau

Upvotes: 1

Fred Foo
Fred Foo

Reputation: 363607

Editing a text file in-place is pretty hard. Your best options are:

  1. Write the program as a Unix filter, i.e. produce the new file on sys.stdout and put it in place with external tools

  2. Read in the whole file, then construct the new file in memory and write it out.

A program following the second line of thought would look like:

# read transcriptions.txt into a list of lines
with open("transcriptions.txt", "r") as tFile:
    inp = tFile.readlines()

# do processing and build a new list of lines
outp = []
for ln in inp:
    if not to_be_deleted(ln):
        outp.append(transform(ln))

# now overwrite transcriptions.txt
with open("transcriptions.txt", "w") as tFile:
    tFile.writelines(outp)

It would be even better if you'd write the processing bit as a list comprehension:

outp = [transform(ln) for ln in inp
                      if not to_be_deleted(ln)]

Upvotes: 2

Related Questions