Reputation: 125
I wrote a script that goes through a list of strings and looks for matches in either (1) lines in a file or (2) filenames in a directory.
Depending on the search mode (user input), each string is appended to a list if (1) it matches the line/filename exactly or (2) the line/filename contains it.
import os
import operator
query_list = ["look", "for", "these"]
search_object = "name_of_file_or_directory"
if os.path.isfile(search_object):
input_object = 1
with open(search_object, "r") as file:
lines = file.read().split("\n")
elif os.path.isdir(search_object):
input_object = 2
search_mode = input("Are you looking for objects that (1) exactly match or (2) contain the query? ")
matched = []
def comparison():
if search_mode == "1" and operator.eq(string, query) or search_mode == "2" and operator.contains(string, query):
matched.append(string)
return matched
for query in query_list:
def search_loop(container):
# container is either list of filenames or list of lines
global string
for string in container:
matched = comparison()
return matched
if input_object == 1:
matched = search_loop(lines)
elif input_object == 2:
matched = search_loop(os.listdir(search_object))
print(matched)
I defined the comparison()
function so I can use it more often later in the script without repeating those lines. However, I had to assign global string
within the other function then, as I get the following error otherwise.
NameError: name 'string' is not defined
I'm wondering how I could avoid using a global variable. I read that classes may often be used to avoid such, but I couldn't figure out how this would be useful here. I'd appreciate any recommendations on how to solve this task in a nice way.
Upvotes: 1
Views: 491
Reputation: 4030
There are a few things we can do to clean up the code. We'll end up getting rid of the global variable by the end.
Instead of waiting until the end to get the directory contents, we can get it at the very beginning. This way we can treat the lines of text and the directory contents exactly the same later on.
import os
import operator
query_list = ["look", "for", "these"]
search_object = "name_of_file_or_directory"
if os.path.isfile(search_object):
with open(search_object, "r") as file:
lines = file.read().split("\n")
elif os.path.isdir(search_object):
lines = os.listdir(search_object)
# At this point, lines is just a list of text - we don't care where it came from.
search_mode = input("Are you looking for objects that (1) exactly match or (2) contain the query? ")
matched = []
def comparison():
if search_mode == "1" and operator.eq(string, query) or search_mode == "2" and operator.contains(string, query):
matched.append(string)
return matched
for query in query_list:
def search_loop(container):
# container is either list of filenames or list of lines
global string
for string in container:
matched = comparison()
return matched
# We were able to remove checks from here
matched = search_loop(lines)
print(matched)
Before, we were checking each time which kind of operation to do (exact match or contains). Instead, we can select the option right when we get the user input and assign it to a single name to be used later on.
import os
import operator
query_list = ["look", "for", "these"]
search_object = "name_of_file_or_directory"
if os.path.isfile(search_object):
with open(search_object, "r") as file:
lines = file.read().split("\n")
elif os.path.isdir(search_object):
lines = os.listdir(search_object)
search_mode = input("Are you looking for objects that (1) exactly match or (2) contain the query? ")
# Because the interface of both functions is the same we can use them directly.
# If they were not the same then we could write an adapter function that would make them have
# the same signatures so they could be used in the same way below.
if search_mode == "1":
search_op = operator.eq
elif search_mode == "2":
search_op = operator.contains
matched = []
for query in query_list:
for string in lines:
if search_op(string, query):
matched.append(string)
print(matched)
Now that we have a better picture of the parts of the program, we can break it into functions. Using functions gives parts of the program concrete names and also helps narrow the scope of the variables they use. The fewer variables used in any piece of code the easier it is to understand later. We avoid global variables by choosing our functions carefully - one important criteria we might use is that everything in the function is self-contained and doesn't need to reference anything outside.
We'll also use the if __name__ == '__main__'
convention so we could include our script from another script if we wanted to.
import os
import operator
def get_contents(search_object):
if os.path.isfile(search_object):
with open(search_object, "r") as file:
return file.read().split("\n")
elif os.path.isdir(search_object):
return os.listdir(search_object)
def get_search_function(user_input):
if search_mode == "1":
return operator.eq
elif search_mode == "2":
return operator.contains
def find_matches(match_function, query_list, lines):
matched = []
for query in query_list:
for string in lines:
if match_function(string, query):
matched.append(string)
return matched
if __name__ == "__main__":
search_object = "name_of_file_or_directory"
lines = get_contents(search_object)
search_mode_input = input("Are you looking for objects that (1) exactly match or (2) contain the query? ")
search_function = get_search_function(search_mode_input)
query_list = ["look", "for", "these"]
matches = find_matches(search_function, query_list, lines)
print(matches)
raise
an appropriate error (for something that can be handled) or assert
(for something that we don't expect to happen and cannot really recover from).argparse
can make it easier to take arguments into the script so we don't have to hard-code variable values.sys.stdin
and leave it to the user to generate text somehow and pipe it to the program, like: cat words.txt | python script.py
for contents or ls . | python script.py
for files in a directory.Upvotes: 1