Auto-learner
Auto-learner

Reputation: 1511

How to compare a substring in a list with another list using python

I am trying to check if the file name exists in a folder, for that I am storing expected file names in a list (expected_file_names) and actual file names are returned in another list (actual_file_names) using python. I am able to get the file names from the folder, but how do I iterate over each list item in actual_file_names and check if substring of it matches with another list item.

Goal

I am trying to get whether filename which starts with DMAMiddleware exists in a folder , actual filename is like DMAMiddleware10.20.20.jar . I want to check whether substring DMAMiddleware exists or not in a lists (expected & actual lists)

Problem

I am not clear on how to compare substrings in a list

Can someone provide me with an example or how this can be achieved. Thanks in advance.

actual_file_names = ['', 'python', 'cmdb_dma_map.json', 'mappings.json', 'vendor_provided_binaries.json',
                     'vendor_provided_binaries_custom.json', 'DMAPremiumDatabase10.50.000.000.jar',
                     'DMAPremiumMiddleware10.50.000.000.jar', 'DMAPremiumUtilities10.50.000.000.jar',
                     'dma_oo_client_bin_linux.zip', 'dma_oo_client_bin_linux.zip.MD5', 'dma_oo_client_code_linux.zip',
                     'dma_oo_client_code_linux.zip.MD5', 'DCAFlowUtilities1.0.0.0.jar', 'DCAKafkaWriter1.0.0.0.jar',
                     'DCAUtilities1.0.0.0.jar']

expected_file_names = ['python', 'cmdb_dma_map.json', 'mappings.json', 'vendor_provided_binaries.json',
                       'vendor_provided_binaries_custom.json', 'DMAPremiumDatabase.jar', 'DMAPremiumMiddleware.jar',
                       'DMAPremiumUtilities.jar']

for f in expected_file_names:
    for g in actual_file_names:
        if f in g:
            print "All file names exists in " + g

        else:
            print "file name "+g+" doesn't exists"

Upvotes: 0

Views: 298

Answers (2)

Arockia
Arockia

Reputation: 440

I have given two ways of achieving this. First one is little bit complicated and the second one is traditional way of doing this.

actual_file_names = ['', 'python', 'cmdb_dma_map.json', 'mappings.json', 'vendor_provided_binaries.json',
                     'vendor_provided_binaries_custom.json', 'DMAPremiumDatabase10.50.000.000.jar',
                     'DMAPremiumMiddleware10.50.000.000.jar', 'DMAPremiumUtilities10.50.000.000.jar',
                     'dma_oo_client_bin_linux.zip', 'dma_oo_client_bin_linux.zip.MD5', 'dma_oo_client_code_linux.zip',
                     'dma_oo_client_code_linux.zip.MD5', 'DCAFlowUtilities1.0.0.0.jar', 'DCAKafkaWriter1.0.0.0.jar',
                     'DCAUtilities1.0.0.0.jar']

expected_file_names = ['python', 'cmdb_dma_map.json', 'mappings.json', 'vendor_provided_binaries.json',
                       'vendor_provided_binaries_custom.json', 'DMAPremiumDatabase.jar', 'DMAPremiumMiddleware.jar',
                       'DMAPremiumUtilities.jar']

# 1'st way
print [str(afn)+" is valid" if any(efn.split(".")[0] in afn for efn in expected_file_names) else str(afn)+"N/A" for afn in actual_file_names]

# 2'nd way
for efn in expected_file_names:
    for afn in actual_file_names:
        if efn.split(".")[0] in afn:
            print afn

Output:

['N/A', 'python is valid', 'cmdb_dma_map.json is valid', 'mappings.json is valid', 'vendor_provided_binaries.json is valid', 'vendor_provided_binaries_custom.json is valid', 'DMAPremiumDatabase10.50.000.000.jar is valid', 'DMAPremiumMiddleware10.50.000.000.jar is valid', 'DMAPremiumUtilities10.50.000.000.jar is valid', 'dma_oo_client_bin_linux.zipN/A', 'dma_oo_client_bin_linux.zip.MD5N/A', 'dma_oo_client_code_linux.zipN/A', 'dma_oo_client_code_linux.zip.MD5N/A', 'DCAFlowUtilities1.0.0.0.jarN/A', 'DCAKafkaWriter1.0.0.0.jarN/A', 'DCAUtilities1.0.0.0.jarN/A']
python
cmdb_dma_map.json
mappings.json
vendor_provided_binaries.json
vendor_provided_binaries_custom.json
vendor_provided_binaries_custom.json
DMAPremiumDatabase10.50.000.000.jar
DMAPremiumMiddleware10.50.000.000.jar
DMAPremiumUtilities10.50.000.000.jar

Upvotes: 0

Thomas Kühn
Thomas Kühn

Reputation: 9810

Here is a way how to do it. The code includes two examples -- the first one truncates the filename after the first period (.), the second also removes all digits from the expected filename. With your input, the two examples have the same result.

import re

actual_file_names = ['', 'python', 'cmdb_dma_map.json', 'mappings.json', 
                     'vendor_provided_binaries.json',
                     'vendor_provided_binaries_custom.json', 
                     'DMAPremiumDatabase10.50.000.000.jar',
                     'DMAPremiumMiddleware10.50.000.000.jar', 
                     'DMAPremiumUtilities10.50.000.000.jar',
                     'dma_oo_client_bin_linux.zip', 
                     'dma_oo_client_bin_linux.zip.MD5', 
                     'dma_oo_client_code_linux.zip',
                     'dma_oo_client_code_linux.zip.MD5', 
                     'DCAFlowUtilities1.0.0.0.jar', 'DCAKafkaWriter1.0.0.0.jar',
                     'DCAUtilities1.0.0.0.jar']

expected_file_names = ['python', 'cmdb_dma_map.json', 'mappings.json', 
                       'vendor_provided_binaries.json',
                       'vendor_provided_binaries_custom.json', 
                       'DMAPremiumDatabase.jar', 'DMAPremiumMiddleware.jar',
                       'DMAPremiumUtilities.jar']


##compare everything after first period:
for expected in expected_file_names:
    part = expected.split('.',1)[0]
    ##print(part)
    matched = False
    for actual in actual_file_names:
        if part in actual:
            print('{} matches {}'.format(expected,actual))
            matched = True

    if not matched:
        print('{} could not be matched'.format(expected))

print('-'*50)

##remove also numbers
for expected in expected_file_names:
    part = re.sub('[0123456789]','',expected.split('.',1)[0])
    ##print(part)
    matched = False
    for actual in actual_file_names:
        if part in actual:
            print('{} matches {}'.format(expected,actual))
            matched = True

    if not matched:
        print('{} could not be matched'.format(expected))

The result is:

python matches python
cmdb_dma_map.json matches cmdb_dma_map.json
mappings.json matches mappings.json
vendor_provided_binaries.json matches vendor_provided_binaries.json
vendor_provided_binaries.json matches vendor_provided_binaries_custom.json
vendor_provided_binaries_custom.json matches vendor_provided_binaries_custom.json
DMAPremiumDatabase.jar matches DMAPremiumDatabase10.50.000.000.jar
DMAPremiumMiddleware.jar matches DMAPremiumMiddleware10.50.000.000.jar
DMAPremiumUtilities.jar matches DMAPremiumUtilities10.50.000.000.jar
--------------------------------------------------
python matches python
cmdb_dma_map.json matches cmdb_dma_map.json
mappings.json matches mappings.json
vendor_provided_binaries.json matches vendor_provided_binaries.json
vendor_provided_binaries.json matches vendor_provided_binaries_custom.json
vendor_provided_binaries_custom.json matches vendor_provided_binaries_custom.json
DMAPremiumDatabase.jar matches DMAPremiumDatabase10.50.000.000.jar
DMAPremiumMiddleware.jar matches DMAPremiumMiddleware10.50.000.000.jar
DMAPremiumUtilities.jar matches DMAPremiumUtilities10.50.000.000.jar

Tested on Python 3.5

Upvotes: 2

Related Questions