geoJshaun
geoJshaun

Reputation: 709

python: extracting files based on partial title with split criteria in title

I have a list of files with inconsistent nomenclature:

pLst = ['CO_002_2016_Q4_Merge.loc', 'CO_002_2016_Merge.zip', 'CO_002_2016_q4_alias.loc', 'CO_002_2017_here_2017_q1_streets_alias.loc.xml', 'CO_002_2017_here_2017_q1_streets_parity.loc', 'AuburnAliasGCS_1984_1106.lox', 'CA_ORG_BCP.loc.xml', 'CA_ORG_BCP.loc', 'Co52 Alias Address Locator.lox', 'CO_002_2017_here_2017_q1_streets_parity.loc.xml', 'CentralCostaCountyStreets.lox', 'CO_002_2016_q4_alias.lox']

I want to extract files that contain the strings in this list:

exCrt = ["2016_Q4", "2016_q4","2017"]

I would like to add all of the files with any of the elements in exCrt to an extraction list but I don't see a way to isolate the split criteria from part of the selection criteria (i.e. "_").

I tried using any:

if any(x in pLst for x in exCrt):
    exLst.add(x)

which resulted in an empty set.

I also tried changing exCrt to ["2016","q4","Q4","2017"] and then using an 'and' 'or' approach:

for i in pLst:
    if exCrt[0] and exCrt[1] or exLst[0] and exCrt[2] or exCrt[3] in i.split("_"):
        exLst.add(i)

But this did not exclude any of the files the unwanted files.

I would like the output to be

( 'CO_002_2016_Q4_Composite.loc',
 'CO_002_2016_q4_alias.loc.xml',
 'CO_002_2016_Q4_Composite.loc.xml',
 'CO_002_2016_Q4_Merge.lox',
 'CO_002_2016_Q4_Merge.loc.xml',
 'CO_002_2016_Q4_Merge.loc',
 'CO_002_2016_q4_alias.loc',
 'CO_002_2016_q4_alias.lox',
 'CO_002_2017_here_2017_q1_streets_alias.lox',
 'CO_002_2017_here_2017_q1_streets_alias.loc',
 'CO_002_2017_here_2017_q1_streets_alias.loc.xml',
 'CO_002_2017_here_2017_q1_streets_parity.loc',
 'CO_002_2017_here_2017_q1_streets_parity.loc.xml')

Upvotes: 2

Views: 83

Answers (3)

coder
coder

Reputation: 12972

A simple list comprehension could be used like so:

exLst = [i for i in pLst for j in exCrt if j in i]

this should work!

Upvotes: 1

Ajax1234
Ajax1234

Reputation: 71461

You can use list comprehension:

pLst = ['CO_002_2016_Q4_Merge.loc', 'CO_002_2016_Merge.zip', 'CO_002_2016_q4_alias.loc', 'CO_002_2017_here_2017_q1_streets_alias.loc.xml', 'CO_002_2017_here_2017_q1_streets_parity.loc', 'AuburnAliasGCS_1984_1106.lox', 'CA_ORG_BCP.loc.xml', 'CA_ORG_BCP.loc', 'Co52 Alias Address Locator.lox', 'CO_002_2017_here_2017_q1_streets_parity.loc.xml', 'CentralCostaCountyStreets.lox', 'CO_002_2016_q4_alias.lox']

exCrt = ["2016_Q4", "2016_q4", "2017"]
final_pLst = [i for i in pLst if any(b in i for b in exCrt)]

Output:

['CO_002_2016_Q4_Merge.loc', 'CO_002_2016_q4_alias.loc', 'CO_002_2017_here_2017_q1_streets_alias.loc.xml', 'CO_002_2017_here_2017_q1_streets_parity.loc', 'CO_002_2017_here_2017_q1_streets_parity.loc.xml', 'CO_002_2016_q4_alias.lox']

Upvotes: 1

grand_chat
grand_chat

Reputation: 541

Your approach using any will work, if you adjust as follows:

exCrt = ["2016_Q4", "2016_q4", "2017"]
exLst = []
for p in pLst:
    if any(x in p for x in exCrt):
        exLst.append(p)

Upvotes: 2

Related Questions