Alex Nikitin
Alex Nikitin

Reputation: 524

Finding missing lines in file

I have a 7000+ lines .txt file, containing description and ordered path to image. Example:

abnormal /Users/alex/Documents/X-ray-classification/data/images/1.png
abnormal /Users/alex/Documents/X-ray-classification/data/images/2.png
normal /Users/alex/Documents/X-ray-classification/data/images/3.png
normal /Users/alex/Documents/X-ray-classification/data/images/4.png

Some lines are missing. I want to somehow automate the search of missing lines. Intuitively i wrote:

f = open("data.txt", 'r')
lines = f.readlines()
num = 1
for line in lines:
    if num in line:
        continue
    else:
        print (line)
    num+=1

But of course it didn't work, since lines are strings. Is there any elegant way to sort this out? Using regex maybe? Thanks in advance!

Upvotes: 1

Views: 546

Answers (2)

match
match

Reputation: 11060

the following should hopefully work - it grabs the number out of the filename, sees if it's more than 1 higher than the previous number, and if so, works out all the 'in-between' numbers and prints them. Printing the number (and then reconstructing the filename later) is needed as line will never contain the names of missing files during iteration.

# Set this to the first number in the series -1
num = lastnum = 0

with open("data.txt", 'r') as f:
    for line in f:
        # Pick the digit out of the filename
        num = int(''.join(x for x in line if x.isdigit()))
        if num - lastnum > 1:
          for i in range(lastnum+1, num):
            print("Missing: {}.png".format(str(i)))
        lastnum = num

The main advantage of working this way is that as long as your files are sorted in the list, it can handle starting at numbers other than 1, and also reports more than one missing number in the sequence.

Upvotes: 1

User that hates AI
User that hates AI

Reputation: 468

You can try this:

lines = ["abnormal /Users/alex/Documents/X-ray-classification/data/images/1.png","normal /Users/alex/Documents/X-ray-classification/data/images/3.png","normal /Users/alex/Documents/X-ray-classification/data/images/4.png"]
maxvalue = 4 # or any other maximum value
missing = []
i = 0
for num in range(1, maxvalue+1):
    if str(num) not in lines[i]:
        missing.append(num)
    else:
      i += 1

print(missing)

Or if you want to check for the line ending with XXX.png:

lines = ["abnormal /Users/alex/Documents/X-ray-classification/data/images/1.png","normal /Users/alex/Documents/X-ray-classification/data/images/3.png","normal /Users/alex/Documents/X-ray-classification/data/images/4.png"]
maxvalue = 4 # or any other maximum value
missing = []
i = 0
for num in range(1, maxvalue+1):
    if not lines[i].endswith(str(num) + ".png"):
        missing.append(num)
    else:
      i += 1

print(missing)

Example: here

Upvotes: 1

Related Questions