Michal Kaut
Michal Kaut

Reputation: 1553

pandas dataframe index: to_list() vs tolist()

I recently wrote a python script for someone, where I converted a pandas dataframe's index into a list using to_list(). However, this does not work for them, as they get: AttributeError: 'Index' object has no attribute 'to_list' with their python interpretter.

I did some searching and found that there is also tolist() that seems to do the same as to_list(): searching on Pandas documentation finds both, with word-for-word identical description. On the other hand, the documentation of Index mentions only to_list().

So I wonder whether there is a difference between the two

Upvotes: 24

Views: 43543

Answers (6)

mirekphd
mirekphd

Reputation: 6763

I'm afraid Pandas created a confusion here, because .to_list() is not / no longer an alias in NumPy (it will raise an AttributeError), so you should use .tolist() (and other "to" methods) without an underscore. While I'm not denying that the to_list alias might have worked, it was undocumented even as far as the NumPy docs on numpy.ndarray.tolist go back, i.e. in v1.13 (released in 2017).

Upvotes: -1

Ramin Eslami
Ramin Eslami

Reputation: 21

In Python;
tolist() is a method of NumPy arrays that returns a list representation of the array.
to_list() is a method of Pandas dataframes that returns a list representation of the dataframe.

Although both methods return the same output, their differences lie in their origins and compatibility. *to_list()* is Pandas-specific.

Upvotes: 1

Himanshu Shrimalve
Himanshu Shrimalve

Reputation: 19

With all the above answers, only thing which we can get is that to_list() is just a alias of original tolist(). I do noticed a difference though: while converting a pandas.DataFrame() created with dict consisting numpy.ndarray, to_list throws an error AttributeError: 'numpy.ndarray' object has no attribute 'to_list', whereas tolist did the job.

dict = {'Name':['Martha', 'Tim', 'Rob', 'Georgia'],
        'Maths':[87, 91, 97, 95],
        'Science':[83, 99, 84, 76]
       }
df = pd.DataFrame(dict)

output['OP.pool_results'] = df.values.to_list()

output:

Traceback (most recent call last):

  File "C:\Users\hshrima\AppData\Local\Temp/ipykernel_24028/1619690827.py", line 1, in <module>
    df.values[0].to_list()

AttributeError: 'numpy.ndarray' object has no attribute 'to_list'

But, if I use tolist, following is the result

df.values.tolist()
[['Martha', 87, 83], ['Tim', 91, 99], ['Rob', 97, 84], ['Georgia', 95, 76]]

Here, tolist refers to ndarray.tolist, whereas the above to_list refers to IndexOpsMixin.tolist from pandas

Upvotes: 1

buran
buran

Reputation: 14233

If you check the source code, you will see that right after tolist() code there is line to_list = tolist, so to_list is just alias for tolist

EDIT: to_list() was added in ver. 0.24.0, see issue#8826

Upvotes: 37

Michal Kaut
Michal Kaut

Reputation: 1553

Expanding on buran's answer (too much text to fit into a comment): Using git blameon the pointed-out source code, one can see that to_list() was indeed added in December 2018 as an alias to tolist(). It was commited as an enhancement to resolve issue 8826, i.e., to make it more inline with to_period() and to_timestamp().

Moreover, changes and comments in pandas/core/series.py show that the original tolist() is "semi-deprecated":

# tolist is not actually deprecated, just suppressed in the __dir__
_deprecations = generic.NDFrame._deprecations | frozenset(
    ['asobject', 'reshape', 'get_value', 'set_value',
     'from_csv', 'valid'])
     'from_csv', 'valid', 'tolist'])

Interesting to see how the system works...

Upvotes: 10

edornd
edornd

Reputation: 461

This question would need a pandas developer to be answered without doubts, but a reasonable guess would be that it serves as support for legacy versions. As you can see from the source code here, the 'source' button for to_list forwards to the same source code of tolist, and the two functions are simply aliases.

Upvotes: 4

Related Questions