Reputation: 183
Here is the code I wrote in a project.
Which one is a better one to write python?
def get_list_of_university_towns():
....
def parse_state(item):
return re.sub('\[edit\]', '', item)
uni_towns['State'] = uni_towns['State'].apply(parse_state)
return uni_towns
Or:
def parse_state(item):
return re.sub('\[edit\]', '', item)
def get_list_of_university_towns():
....
uni_towns['State'] = uni_towns['State'].apply(parse_state)
return uni_towns
This "parse_state(item)" function is only called once in "get_list_of_university_towns()" and will be never used again. Personally I think define it inside a function will be easier to understand. However, I barely see this kind of codes in other people's project.
So, how should I write this piece of code?
Upvotes: 5
Views: 2202
Reputation: 5286
Yes, it is. Actually, it is more Pythonic to do it inside than outside in order not to pollute the module namespace.
The option with the function definition inside the other function works. Another Pythonic way would be to use an anonymous lambda function:
def get_list_of_university_towns():
....
uni_towns['State'] = uni_towns['State'].apply(lambda item: re.sub('\[edit\]', '', item))
return uni_towns
As it has been suggested, and now that you said it is a panda's dataframe which means that the function will be called more than once, you should either compile the expresion or use str.replace()
instead of re.sub()
:
def get_list_of_university_towns():
....
uni_towns['State'] = uni_towns['State'].apply(lambda item: item.replace('[edit]', ''))
return uni_towns
Upvotes: 7
Reputation: 341
I agree with Adirio's answer if a function is needed.
Alternatively, you could consider whether a function is really necessary if it's only used once. If it is possible to iterate over uni_towns['State'], this would achieve the same and is in my opinion more readable:
def get_list_of_university_towns():
....
for state in uni_towns['State']:
state = re.sub('\[edit\]', '', state)
return uni_towns
Upvotes: 0
Reputation: 359
You don't need to create another function for what you are trying to achieve. Creating a function outside is better so that you can call it in other functions as well rather than limiting the use of it to only to where it is written.
Upvotes: -1
Reputation: 1831
Both are fine. The first one is cleaner as you do not pollute module namespace with a name that is not used elsewhere.
And the first form should be slightly faster as parse_state
is a local variable when it is called, especially important if it is used within a loop. There is no runtime cost to defining the function inside another function. At runtime, it is a simple assignment.
However, if it is used inside a loop, you should also compile the regular expression in the module scope:
_state_re = re.compile('\[edit\]')
def get_list_of_university_towns():
...
def parse_state(item):
return _state_re.sub('', item)
uni_towns['State'] = uni_towns['State'].apply(parse_state)
return uni_towns
Upvotes: 0