Python: referring to each duplicate item in a list by unique index

Question

I am trying to extract particular lines from txt output file. The lines I am interested in are few lines above and few below the key_string that I am using to search through the results. The key string is the same for each results.

fi = open('Inputfile.txt')
fo = open('Outputfile.txt', 'a')

lines = fi.readlines()
filtered_list=[]

for item in lines:
    if item.startswith("key string"):
        filtered_list.append(lines[lines.index(item)-2])
        filtered_list.append(lines[lines.index(item)+6])
        filtered_list.append(lines[lines.index(item)+10])
        filtered_list.append(lines[lines.index(item)+11])       
fo.writelines(filtered_list)

fi.close()
fo.close()

The output file contains the right lines for the first record, but multiplied for every record available. How can I update the indexing so it can read every individual record? I've tried to find the solution but as a novice programmer I was struggling to use enumerate() function or collections package.

Ryszard Szopa · Accepted Answer

First of all, it would probably help if you said what exactly goes wrong with your code (a stack trace, it doesn't work at all, etc). Anyway, here's some thoughts. You can try to divide your problem into subproblems to make it easier to work with. In this case, let's separate finding the relevant lines from collecting them.

First, let's find the indexes of all the relevant lines.

key = "key string"
relevant = []
for i, item in enumerate(lines):
    if item.startswith(key):
        relevant.append(item)

enumerate is actually quite simple. It takes a list, and returns a sequence of (index, item) pairs. So, enumerate(['a', 'b', 'c']) returns [(0, 'a'), (1, 'b'), (2, 'c')].

What I had written above can be achieved with a list comprehension:

relevant = [i for (i, item) in enumerate(lines) if item.startswith(key)]

So, we have the indexes of the relevant lines. Now, let's collected them. You are interested in the line 2 lines before it and 6 and 10 and 11 lines after it. If your first lines contains the key, then you have a problem – you don't really want lines[-1] – that's the last item! Also, you need to handle the situation in which your offset would take you past the end of the list: otherwise Python will raise an IndexError.

out = []
for r in relevant:
    for offset in -2, 6, 10, 11:
        index = r + offset
        if 0 < index < len(lines):
            out.append(lines[index])

You could also catch the IndexError, but that won't save us much typing, as we have to handle negative indexes anyway.

The whole program would look like this:

key = "key string"
with open('Inputfile.txt') as fi:
    lines = fi.readlines()

relevant = [i for (i, item) in enumerate(lines) if item.startswith(key)]
out = []
for r in relevant:
    for offset in -2, 6, 10, 11:
        index = r + offset
        if 0 < index < len(lines):
            out.append(lines[index])

with open('Outputfile.txt', 'a') as fi:
    fi.writelines(out)

Python: referring to each duplicate item in a list by unique index

Answers (2)

Related Questions