Akumarzen
Akumarzen

Reputation:

find missing numeric from ALPHANUMERIC - Python

How would I write a function in Python to determine if a list of filenames matches a given pattern and which files are missing from that pattern? For example:

Input ->

KUMAR.3.txt
KUMAR.4.txt
KUMAR.6.txt
KUMAR.7.txt
KUMAR.9.txt
KUMAR.10.txt
KUMAR.11.txt
KUMAR.13.txt
KUMAR.15.txt
KUMAR.16.txt

Desired Output-->

KUMAR.5.txt
KUMAR.8.txt
KUMAR.12.txt
KUMAR.14.txt

Input -->

KUMAR3.txt
KUMAR4.txt
KUMAR6.txt
KUMAR7.txt
KUMAR9.txt
KUMAR10.txt
KUMAR11.txt
KUMAR13.txt
KUMAR15.txt
KUMAR16.txt

Desired Output -->

KUMAR5.txt
KUMAR8.txt
KUMAR12.txt
KUMAR14.txt

Upvotes: 1

Views: 847

Answers (2)

John Fouhy
John Fouhy

Reputation: 42193

You can approach this as:

  1. Convert the filenames to appropriate integers.
  2. Find the missing numbers.
  3. Combine the missing numbers with the filename template as output.

For (1), if the file structure is predictable, then this is easy.

def to_num(s, start=6):
    return int(s[start:s.index('.txt')])

Given:

lst = ['KUMAR.3.txt', 'KUMAR.4.txt', 'KUMAR.6.txt', 'KUMAR.7.txt',
       'KUMAR.9.txt', 'KUMAR.10.txt', 'KUMAR.11.txt', 'KUMAR.13.txt',
       'KUMAR.15.txt', 'KUMAR.16.txt']

you can get a list of known numbers by: map(to_num, lst). Of course, to look for gaps, you only really need the minimum and maximum. Combine that with the range function and you get all the numbers that you should see, and then remove the numbers you've got. Sets are helpful here.

def find_gaps(int_list):
    return sorted(set(range(min(int_list), max(int_list))) - set(int_list))

Putting it all together:

missing = find_gaps(map(to_num, lst))
for i in missing:
    print 'KUMAR.%d.txt' % i

Upvotes: 2

John Millikin
John Millikin

Reputation: 200836

Assuming the patterns are relatively static, this is easy enough with a regex:

import re

inlist = "KUMAR.3.txt KUMAR.4.txt KUMAR.6.txt KUMAR.7.txt KUMAR.9.txt KUMAR.10.txt KUMAR.11.txt KUMAR.13.txt KUMAR.15.txt KUMAR.16.txt".split()

def get_count(s):
    return int(re.match('.*\.(\d+)\..*', s).groups()[0])

mincount = get_count(inlist[0])
maxcount = get_count(inlist[-1])
values = set(map(get_count, inlist))
for ii in range (mincount, maxcount):
    if ii not in values:
        print 'KUMAR.%d.txt' % ii

Upvotes: 1

Related Questions