Subtract string in text file based on index list in Python

Question

I have a .txt file of which I want to extract the text that's between certain locations in Python. In order to do this I made a list of the index of the locations, so I can subtrat these locations in order to get the text. And append this to a different .txt file.

For instance (pseudocode):

indexList = [188, 1089, 364, 5697, 2, 5230, 2683, 2956]

with open(str(mytxtfile), 'r') as f:
     for line in f:
        subtract the text from location 2956 with the text of location 2683.
        Now append this to a txt variable.
        Loop this over for the entire list.

vasia · Accepted Answer

You can represent the start/end character positions using a list of tuples. You can read in the entire contents of the file into a string variable using fileDescriptor.read(). You can then use string slicing to get the text at specific offsets, i.e. x = "abcdefg"; x[2:5] is "cde".

indexList = [(188, 1089), (364, 5697), (2, 5230), (2683, 2956)]

with open(str(mytxtfile), 'r') as f:
    contents = f.read()

textFragments = []
for start,end in indexList:
    textFragments.append(contents[start:end])

# textFragments[0] = text between positions 188 and 1089
# textFragments[1] = text between positions 364 and 5697
# so forth

If you want all of these fragments in one string variable, you can concatenate them using join, like this: ''.join(textFragments)

Subtract string in text file based on index list in Python

Answers (1)

Related Questions