Tim
Tim

Reputation: 63

Comparing part of a string within a list

I have a list of strings:

fileList = ['YMML.2019.09.10-Run.1-Final.pdf',
            'YMML.2019.09.10-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.1-Final.pdf',
            'YMML.2019.09.12-Run.2-Initial.pdf',
            'YMML.2019.09.13-Run.2-Initial.pdf',
            'YMML.2019.09.12-Run.1-Final.pdf',
            'YMML.2019.09.13-Run.1-Final.pdf',
            'YMML.2019.09.14-Run.1-Final.pdf',]

and I'd like to confirm that there is both a Run.1-Final and Run.2-Initial for each date.

I've tried something like:

for i in range(len(directoryList)):
    if directoryList[i][5:15] != directoryList[i + 1][5:15]:
        print(directoryList[i] + ' is missing.')
    i += 2

and I'd like the output to be

'YMML.2019.09.14-Run.2-Initial.pdf is missing,

Perhaps something like

dates = [directoryList[i][5:15] for i in range(len(directoryList))]
counter = collections.Counter(dates)

But then having trouble extracting from the dictionary.

Upvotes: 0

Views: 65

Answers (4)

ggorlen
ggorlen

Reputation: 56865

Here's an O(n) solution which collects items into a defaultdict by date, then filters on quantity seen, restoring original names from the remaining value:

from collections import defaultdict

files = [
    'YMML.2019.09.10-Run.1-Final.pdf',
    'YMML.2019.09.10-Run.2-Initial.pdf',
    'YMML.2019.09.11-Run.2-Initial.pdf',
    'YMML.2019.09.11-Run.1-Final.pdf',
    'YMML.2019.09.12-Run.2-Initial.pdf',
    'YMML.2019.09.13-Run.2-Initial.pdf',
    'YMML.2019.09.12-Run.1-Final.pdf',
    'YMML.2019.09.13-Run.1-Final.pdf',
    'YMML.2019.09.14-Run.1-Final.pdf',
]

seen = defaultdict(list)

for x in files:
    seen[x[5:15]].append(x)

missing = [v[0] for k, v in seen.items() if len(v) < 2]
print(missing) # => ['YMML.2019.09.14-Run.1-Final.pdf']

Getting names of partners can be done with a conditional:

names = [
    x[:20] + "2-Initial.pdf" if x[20] == "1" else
    x[:20] + "1-Final.pdf" for x in missing
]
print(names) # => ['YMML.2019.09.14-Run.2-Initial.pdf']

Upvotes: 1

arj7192
arj7192

Reputation: 21

This works:

fileList = ['YMML.2019.09.10-Run.1-Final.pdf',
            'YMML.2019.09.10-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.1-Final.pdf',
            'YMML.2019.09.12-Run.2-Initial.pdf',
            'YMML.2019.09.13-Run.2-Initial.pdf',
            'YMML.2019.09.12-Run.1-Final.pdf',
            'YMML.2019.09.13-Run.1-Final.pdf',
            'YMML.2019.09.14-Run.1-Final.pdf',]

initial_set = {filename[:15] for filename in fileList if 'Initial' in filename}
final_set = {filename[:15] for filename in fileList if 'Final' in filename}

for filename in final_set - initial_set:
    print(filename + '-Run.2-Initial.pdf is missing.')
for filename in initial_set - final_set:
    print(filename + '-Run.1-Final.pdf is missing.')

Upvotes: 0

Appa21
Appa21

Reputation: 258

I'm kind of late but here's what i found to be the simplest way, maybe not the most efficent :

for file in fileList:
    if file[20:27] == "1-Final":
        if (file[0:20] + "2-Initial.pdf") not in fileList:
            print(file)
    elif file[19:29] is "2-Initial.pdf":
        if (file[0:20] + "1-Final.pdf") not in fileList:
            print(file)

Upvotes: 1

aybry
aybry

Reputation: 316

To make it more readable, you could create a list of dates first, then loop over those.

file_list = ['YMML.2019.09.10-Run.1-Final.pdf',
            'YMML.2019.09.10-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.1-Final.pdf',
            'YMML.2019.09.12-Run.2-Initial.pdf',
            'YMML.2019.09.13-Run.2-Initial.pdf',
            'YMML.2019.09.12-Run.1-Final.pdf',
            'YMML.2019.09.13-Run.1-Final.pdf',
            'YMML.2019.09.14-Run.1-Final.pdf',]

dates = set([item[5:15] for item in file_list])

for date in dates:
   if 'YMML.' + date + '-Run.1-Final.pdf' not in file_list:
      print('YMML.' + date + '-Run.1-Final.pdf is missing')
   if 'YMML.' + date + '-Run.2-Initial.pdf' not in file_list:
      print('YMML.' + date + '-Run.2-Initial.pdf is missing')

set() takes the unique values in the list to avoid looping through them all twice.

Upvotes: 1

Related Questions