Reputation: 10882
I have a function to port from another language, could you please help me make it "pythonic"?
Here the function ported in a "non-pythonic" way (this is a bit of an artificial example - every task is associated with a project or "None", we need a list of distinct projects, distinct meaning no duplication of the .identifier property, starting from a list of tasks):
@staticmethod
def get_projects_of_tasks(task_list):
projects = []
project_identifiers_seen = {}
for task in task_list:
project = task.project
if project is None:
continue
project_identifier = project.identifier
if project_identifiers_seen.has_key(project_identifier):
continue
project_identifiers_seen[project_identifier] = True
projects.append(project)
return projects
I have specifically not even started to make it "pythonic" not to start off on the wrong foot (e.g. list comprehension with "if project.identifier is not None, filter() based on predicate that looks up the dictionary-based registry of identifiers, using set() to strip duplicates, etc.)
EDIT:
Based on the feedback, I have this:
@staticmethod
def get_projects_of_tasks(task_list):
projects = []
project_identifiers_seen = set()
for task in task_list:
project = task.project
if project is None:
continue
project_identifier = project.identifier
if project_identifier in project_identifiers_seen:
continue
project_identifiers_seen.add(project_identifier)
projects.append(project)
return projects
Upvotes: 4
Views: 216
Reputation: 151047
There are already a lot of good answers, and, indeed, you have accepted one! But I thought I would add one more option. A number of people have seen that your code could be made more compact with generator expressions or list comprehensions. I'm going to suggest a hybrid style that uses generator expressions to do the initial filtering, while maintaining your for
loop in the final filter.
The advantage of this style over the style of your original code is that it simplifies the flow of control by eliminating continue
statements. The advantage of this style over a single list comprehension is that it avoids multiple accesses to task.project.identifier
in a natural way. It also handles mutable state (the seen
set) transparently, which I think is important.
def get_projects_of_tasks(task_list):
projects = (task.project for task in task_list)
ids_projects = ((p.identifier, p) for p in projects if p is not None)
seen = set()
unique_projects = []
for id, p in ids_projects:
if id not in seen:
seen.add(id)
unique_projects.append(p)
return unique_projects
Because these are generator expressions (enclosed in parenthesis instead of brackets), they don't build temporary lists. The first generator expression creates an iterable of projects; you could think of it as performing the project = task.project
line from your original code on all the projects at once. The second generator expression creates an iterable of (project_id, project)
tuples. The if
clause at the end filters out the None
values; (p.identifier, p)
is only evaluated if p
passes through the filter. Together, these two generator expressions do away with your first two if
blocks. The remaining code is essentially the same as yours.
Note also the excellent suggestion from Marcin/delnan that you create a generator using yield
. This cuts down further on the verbosity of your code, boiling it down to its essentials:
def get_projects_of_tasks(task_list):
projects = (task.project for task in task_list)
ids_projects = ((p.identifier, p) for p in projects if p is not None)
seen = set()
for id, p in ids_projects:
if id not in seen:
seen.add(id)
yield p
The only disadvantage -- in case this isn't obvious -- is that if you want to permanently store the projects, you have to pass the resulting iterable to list
.
projects_of_tasks = list(get_projects_of_tasks(task_list))
Upvotes: 1
Reputation: 49856
def get_projects_of_tasks(task_list):
seen = set()
return [seen.add(task.project.identifier) or task.project #add is always None
for task in task_list if
task.project is not None and task.project.identifier not in seen]
This works because (a) add
returns None
(and or
returns the value of the last expression evaluated) and (b) the mapping clause (the first clause) is only executed if the if clause is True
.
There is no reason that it has to be in a list comprehension - you could just as well set it out as a loop, and indeed you may prefer to. This way has the advantage that it is clear that you are just building a list, and what is supposed to be in it.
I've not used staticmethod
because there is rarely a need for it. Either have this as a module-level function, or a classmethod
.
An alternative is a generator (thanks to @delnan for point this out):
def get_projects_of_tasks(task_list):
seen = set()
for task in task_list:
if task.project is not None and task.project.identifier not in seen:
identifier = task.project.identifier
seen.add(identifier)
yield task.project
This eliminates the need for a side-effect in comprehension (which is controversial), but keeps clear what is being collected.
For the sake of avoiding another if/continue construction, I have left in two accesses to task.project.identifier
. This could be conveniently eliminated with the use of a promise library.
This version uses promises to avoid repeated access to task.project.identifier without the need to include an if/continue:
from peak.util.proxies import LazyProxy, get_cache # pip install ProxyTypes
def get_projects_of_tasks(task_list):
seen = set()
for task in task_list:
identifier = LazyProxy(lambda:task.project.identifier) # a transparent promise
if task.project is not None and identifier not in seen:
seen.add(identifier)
yield task.project
This is safe from AttributeErrors because task.project.identifier
is never accessed before task.project
is checked.
Upvotes: 3
Reputation: 23140
What about:
project_list = {task.project.identifier:task.project for task in task_list if task.project is not None}
return project_list.values()
For 2.6- use dict constructor instead:
return dict((x.project.id, x.project) for x in task_list if x.project).values()
Upvotes: 3
Reputation: 69042
Some say EAFP is pythonic, so:
@staticmethod
def get_projects_of_tasks(task_list):
projects = {}
for task in task_list:
try:
if not task.project.identifier in projects:
projects[task.project.identifier] = task.project
except AttributeError:
pass
return projects.values()
of cours an explicit check wouldn't be wrong either, and would of course be better if many tasks have not project.
And just one dict to keep track of seen identifiers and projects would be enough, if the order of the projects matters, then a OrderedDict
(python2.7+) could come in handy.
Upvotes: 1
Reputation: 599748
There's nothing massively unPythonic about this code. A couple of possible improvements:
project_identifiers_seen
could be a set, rather than a dictionary.foo.has_key(bar)
is better spelled bar in foo
staticmethod
of a class. Usually there's no need for a class in Python unless you're actually doing data encapsulation. If this is just a normal function, make it a module-level one.Upvotes: 7