Reputation: 3355
which one is a better design in the following scenario and why?
A:
stop_words = ['com1', 'com2']
def clean_text(text_tokens, stop_words):
return [token for token in text_tokens if token not in stop_words]
clean_text(['hello', 'world', 'com1', 'com2'], stop_words)
B:
def clean_text(text_tokens):
stop_words = ['com1', 'com2']
return [token for token in text_tokens if token not in stop_words]
clean_text(['hello', 'world', 'com1', 'com2'])
C:
STOP_WORDS = ['com1', 'com2']
def clean_text(text_tokens):
return [token for token in text_tokens if token not in STOP_WORDS]
clean_text(['hello', 'world', 'com1', 'com2'])
Added C version based on @MisterMiyagi answer.
Note1: In this context, stop_words is fixed and does not change.
Note2: stop_words can be a small or a very large list.
Upvotes: 0
Views: 1335
Reputation: 52099
Prefer to create constants at global scope. The global scope is evaluated once, whereas function-local scope is evaluated on each function call.
For very large searches, prefer to use a set
due to its O(1) lookup, versus the list
O(n) lookup. Values that are intended as constants should be named with ALL_CAPS_NAMES. Functions should directly reference constants iff they are not meant to be replaced.
STOP_WORDS = {'com1', 'com2'} # constant set of words
def clean_text(text_tokens):
return [token for token in text_tokens if token not in STOP_WORDS]
# directly access constant ^
clean_text(['hello', 'world', 'com1', 'com2'])
For small constants, it may be advantageous to provide them as a literal. Even CPython is able to optimise inline literals to actual constants.
def clean_text(text_tokens):
return [
token
for token in text_tokens
if token not in {'com1', 'com2'}
# ^ compiled as LOAD_CONST (frozenset({'com2', 'com1'}))
]
clean_text(['hello', 'world', 'com1', 'com2'])
The current optimiser converts list
and tuple
literals to tuple
constants, and set
and frozenset
literals to frozenset
constants
Upvotes: 4
Reputation: 59201
Middle ground: use a default value for the argument.
def clean_text(text_tokens, stop_words={'com1', 'com2'}):
return [token for token in text_tokens if token not in stop_words]
clean_text(['hello', 'world', 'com1', 'com2'])
Now the constant {'com1', 'com2'}
is only created once (when the function is defined); it doesn't pollute the global scope; and if you end up wanting to, you can optionally pass different stop_words
when you call clean_text
.
Upvotes: 5
Reputation: 100
Scenario A is better if you want to pass different lists for stop_words
as a parameter each time you call the function, while Scenario B only test it for ['com1','com2']
,which mean you only change this list when you edit the function itself.
In conclusion: Scenario A is better to test different lists and pass them as parameters in the function.
Upvotes: 0