kdubs
kdubs

Reputation: 946

Searching a list of strings for multiple keywords

I have two python lists, one is a list of keywords and the other is a list of file names. I need to parse the list of filenames based on the keywords that I have. I want python to match the filename with a keyword and then perform an operation based on which keyword it gets matched to.

What I have looks like this:

keywords = ["_CMD_","_COMM_","_RETRANSMIT_"]
file_list = ['2B_CMD_2015.txt','2C_CMD_2015.txt','RETRANSMIT_2015.txt']

for f_name in file_list:
  for keyword in keywords:
    if keyword in f_name:
      #perform operation based on what keyword is matched
    else:
      #print an error

The problem I'm having with this is that since it loops through the keywords it prints an error until it finds the keyword that is in the file name and then performs the operation, but I only want it to print an error if none of the keywords are found in the file name it is searching.

I tried using any() but that seems to stop checking files after it finds a match. For example, using

for keyword in keywords:
  if any(keyword in f_name for f_name in file_list):
    print f_name
    print keyword

Returns

2B_CMD_2015.txt
_CMD_
2B_CMD_2015.txt
_RETRANSMIT_

Which isn't correct.

Edit Also tried using regex but not sure if I'm doing it the proper way:

for keyword in keywords:
  for item in wordlist:
    if re.search(keyword,item) is not None:
        print keyword
        print item
    else:
        print "nope"

Returns:

nope
nope
nope
_CMD_
2B_CMD_2015.txt
_CMD_
2C_CMD_2015.txt
nope
nope
nope
_RETRANSMIT_
_RETRANSMIT_2015.txt
nope
nope
nope

Can anyone help me out with this? I feel like it shouldn't be this difficult.

Upvotes: 4

Views: 922

Answers (5)

Robᵩ
Robᵩ

Reputation: 168626

Consider using for-else instead of if-else:

for f_name in file_list:
  for keyword in keywords:
    if keyword in f_name:
      print "Found keyword %s in name %s"%(keyword, f_name)
      break
  else:
    print "Found no keyword"

Notice the indentation level. The else block matches the for, not the if. Also note that the if must end with break if you want to avoid executing the for-else.

Upvotes: 3

Frerich Raabe
Frerich Raabe

Reputation: 94319

I suggest to make keywords a list of tuples which couples every keyword with the handler. You can use the for..else construct to handle files which are not matched. Consider e.g.:

def handleCmd(fn):
    print "handleCmd: " + fn

def handleComm(fn):
    print "handleComm: " + fn

def handleRetransmit(fn):
    print "handleRetransmit: " + fn

keywords = [ ( "_CMD_", handleCmd ),
             ( "_COMM_", handleComm ),
             ( "RETRANSMIT_", handleRetransmit ),
           ]


file_list = ['2B_CMD_2015.txt','2C_CMD_2015.txt','RETRANSMIT_2015.txt','bogus.t>

for fn in file_list:
    for kw, handle in keywords:
        if kw in fn:
            handle(fn)
            break
    else:
        print "OH NOE"

This prints

handleCmd: 2B_CMD_2015.txt
handleCmd: 2C_CMD_2015.txt
handleRetransmit: RETRANSMIT_2015.txt
OH NOE

Upvotes: 0

TigerhawkT3
TigerhawkT3

Reputation: 49318

The basic way to do this is to set a flag:

for f_name in file_list:
    flag = False
    for keyword in keywords:
        if keyword in f_name:
            flag = True
            #perform operation based on what keyword is matched
    if not flag:
        #print an error

Upvotes: 1

Reut Sharabani
Reut Sharabani

Reputation: 31339

Filter the list using any and then use it:

keywords = ["_CMD_","_COMM_","_RETRANSMIT_"]
file_list = ['2B_CMD_2015.txt','2C_CMD_2015.txt','RETRANSMIT_2015.txt']
filtered = [file_name for file_name in file_list if any(keyword in file_name for keyword in keywords)]
if filtered:
    # do stuff with 'filtered'
    print("processing files...")
else:
    print("error")

Example:

>>> keywords = ["_CMD_","_COMM_","_RETRANSMIT_"]
>>> file_list = ['2B_CMD_2015.txt','2C_CMD_2015.txt','RETRANSMIT_2015.txt']
>>> filtered = [file_name for file_name in file_list if any(keyword in file_name for keyword in keywords)
...
... ]
>>> filtered
['2B_CMD_2015.txt', '2C_CMD_2015.txt']

Upvotes: 0

iobender
iobender

Reputation: 3486

for-else can help you. The else clause will execute if the inner for loop is not broken out of, which only happens if you find a match. Note that this means only the first match is considered and it will not look for more matches.

keywords = ["_CMD_","_COMM_","_RETRANSMIT_"]
file_list = ['2B_CMD_2015.txt','2C_CMD_2015.txt','RETRANSMIT_2015.txt']

for f_name in file_list:
  for keyword in keywords:
    if keyword in f_name:
      #perform operation based on what keyword is matched
      break
  else:
    #print an error

Upvotes: 1

Related Questions