Reputation: 3565
When I want a list from a DataFrame column (pandas 1.0.1
), I can do:
df['column'].to_list()
or I can use:
list(df['column'])
The two alternatives works well, but what are the differences between them?
Is one method better than the other?
Upvotes: 8
Views: 1621
Reputation: 59274
list
receives an iterable and returns a pure python list. It is a built-in python way to convert any iterable into a pure python list.
to_list
is a method from the core pandas object classes which converts their objects to pure python lists
. The difference is that the implementation is done by pandas core developers, which may optimize the process according to their understanding, and/or add extra functionalities in the conversion that a pure list(....)
wouldn't do.
For example, the source_code
for this piece is:
def tolist(self):
'''(...)
'''
if self.dtype.kind in ["m", "M"]:
return [com.maybe_box_datetimelike(x) for x in self._values]
elif is_extension_array_dtype(self._values):
return list(self._values)
else:
return self._values.tolist()
Which basically means to_list
will likely end up using either a normal list comprehension - analogous to list(...)
but enforcing that the final objects are of panda's datetime
type instead of python's datetime -; a straight pure list(...)
conversion; or using numpy
's tolist()
implementation.
The differences between the latter and python's list(...)
can be found in this thread
.
Upvotes: 10