Reputation: 71600
Why do we need three ways for operating?
(I use multiplication for examples)
First way:
df['a'] * 5
Second way:
df['a'].mul(5)
Third way:
df['a'].__mul__(5)
Isn't just two enough, no need an mul
, I was wondering can it be like normal ways, like a integer
First way:
3 * 5
Second way:
(3).__mul__(5)
But in regular bases of an inetger:
(3).mul(5)
Would break.
I am just curious, why do we need this much stuff in Pandas, it's same with addition, subtraction and division.
Upvotes: 2
Views: 362
Reputation: 19885
*
and mul
do the same thing, but __mul__
is different.
*
and mul
perform some checks before delegating to __mul__
. There are two things that you should know about.
NotImplemented
There is a special singleton value NotImplemented
that is returned by a class's __mul__
in cases where it cannot handle the other operand. This then tells Python to try __rmul__
. If that fails too, then a generic TypeError
is raised. If you use __mul__
directly, you won't get this logic. Observe:
class TestClass:
def __mul__(self, other):
return NotImplemented
TestClass() * 1
Output:
TypeError: unsupported operand type(s) for *: 'TestClass' and 'int'
Compare that with this:
TestClass().__mul__(1)
Output:
NotImplemented
This is why, in general, you should avoid calling the dunder (magic) methods directly: you bypass certain checks that Python does.
Where you attempt to perform something like Base() * Derived()
, where Derived
inherits from Base
, you would expect Base.__mul__(Derived())
to be called first. This can pose problems, since Derived.__mul__
is more likely to know how to handle such situations.
Therefore, when you use *
, Python checks whether the right operand's type is more derived than the left's, and if so, calls the right operand's __rmul__
method directly.
Observe:
class Base:
def __mul__(self, other):
print('base mul')
class Derived(Base):
def __rmul__(self, other):
print('derived rmul')
Base() * Derived()
Output:
derived rmul
Notice that even though Base.__mul__
does not return NotImplemented
and can clearly handle an object of type Derived
, Python doesn't even look at it first; it delegates to Derived.__rmul__
immediately.
For completeness, there is one difference between *
and mul
, in the context of pandas
: mul
is a function, and can therefore be passed around in a variable and used independently. For example:
import pandas as pd
pandas_mul = pd.DataFrame.mul
pandas_mul(pd.DataFrame([[1]]), pd.DataFrame([[2]]))
On the other hand, this will fail:
*(pd.DataFrame([[1]]), pd.DataFrame([[2]]))
Upvotes: 3
Reputation: 611
First off, the third way (df['a'].__mul__(5)
) should never be used since it's an internal method that's called by a Python class. In general, users don't touch any of the "dunder" methods.
Regarding the other two ways, the first way is obvious; you just multiply the thing. It's standard math.
The second way gets a bit more interesting. One example of how I've used that method is when the function you want to apply is a variable.
For example:
def pandas_math(series, func, val):
return getattr(series, func)(val)
pandas_math(df['a'], 'mul', 5)
will give the same result as df['a'].mul(5)
but now you can pass mul
as a variable, or whatever other function you want to use. It's much easier than hard-coding all the symbols.
Upvotes: 1
Reputation: 42746
Both the "magic method" __mul__
and the operator *
are the same in the underliying python (*
just calls __mul__
), and as you pointed out it is the way python stadarized handles things. The other method mul
is a method that you can use for mapping (use map
) and avoiding using a lambda x, y: x*mul
for example.
Yes, you could still use __mul__
but usually it is not the purpose of those methods (__x__
) to be used as normal functions and a simple mul
makes the code more clear.
So, you dont really "need" it, but it is nice to have and use.
Upvotes: 1