Reputation: 453
So .loc and .iloc are not your typical functions. They somehow use [ and ] to surround the arguments so that it is comparable to normal array indexing. However, I have never seen this in another library (that I can think of, maybe numpy as something like this that I'm blanking on), and I have no idea how it technically works/is defined in the python code.
Are the brackets in this case just syntactic sugar for a function call? If so, how then would one make an arbitrary function use brackets instead of parenthesis? Otherwise, what is special about their use/defintion Pandas?
Upvotes: 30
Views: 4806
Reputation: 19123
Note: The first part of this answer is a direct adaptation of my answer to this other question, that was answered before this question was reopened. I expand on the "why" in the second part.
So .loc and .iloc are not your typical functions
Indeed, they are not functions at all. I'll make examples with loc
, iloc
is analogous (it uses different internal classes).
The simplest way to check what loc
actually is, is:
import pandas as pd
df = pd.DataFrame()
print(df.loc.__class__)
which prints
<class 'pandas.core.indexing._LocIndexer'>
this tells us that df.loc
is an instance of a _LocIndexer
class. The syntax loc[]
derives from the fact that _LocIndexer
defines __getitem__
and __setitem__
*, which are the methods python calls whenever you use the square brackets syntax.
So yes, brackets are, technically, syntactic sugar for some function call, just not the function you thought it was (there are of course many reasons why python is designed this way, I won't go in the details here because 1) I am not sufficiently expert to provide an exhaustive answer and 2) there are a lot of better resources on the web about this topic).
*Technically, it's its base class _LocationIndexer
that defines those methods, I'm simplifying a bit here
Why does Pandas use square brackets with .loc and .iloc?
I'm entering speculation area here, because I couldn't find any document explicitly talking about design choices in Pandas, however: there are at least two good reasons I see for choosing the square brackets.
The first, and most important reason is: you simply can't do with a function call everything you do with the square-bracket notation, because assigning to a function call is a syntax error in python:
# contrived example to show this can't work
a = []
def f():
global a
return a
f().append(1) # OK
f() = dict() # SyntaxError: cannot assign to function call
Using round brackets for a "function" call, calls the underlying __call__
method (note that any class that defines __call__
is callable
, so "function" call is an incorrect term because python doesn't care whether something is a function or just behaves like one).
Using square brackets, instead, alternatively calls __getitem__
or __setitem__
depending on when the call happens (__setitem__
if it's on the left of an assignment operator, __getitem__
in any other case). There is no way to mimic this behaviour with a function call, you'd need a setter method to modify the data in the dataframe, but it still wouldn't be allowed in an assignment operation:
# imaginary method-based alternative to the square bracket notation:
my_data = df.get_loc(my_index)
df.set_loc(my_index, my_data*2)
This example brings me to the second reason: consistency. You can access elements of a DataFrame via square brackets:
something = df['a']
df['b'] = 2*something
when using loc
you're still trying to refer to some items in the DataFrame, so it's more consistent to use the same syntax instead of asking the user to use some getter and setter functions (it's also, I believe, "more pythonic", but that's a fuzzy concept I'd rather stay away from).
Upvotes: 18
Reputation: 8927
Underneath the covers, both are using the __setitem__
and __getitem__
functions.
Upvotes: 2