Reputation: 305
Python function max(3,6) works under pyspark shell. But if it is put in an application and submit, it will throw an error: TypeError: _() takes exactly 1 argument (2 given)
Upvotes: 4
Views: 4792
Reputation: 468
If you get this error even after verifying that you have NOT used from pyspark.sql.functions import *
, then try the following:
Use import builtins as py_builtin
And then correspondingly call it with the same prefix.
Eg: py_builtin.max()
*Adding David Arenburg's and user3610141's comments as an answer, as that is what help me fix my problem in databricks where there was a name collision with min() and max() of pyspark with python built-ins.
Upvotes: 6
Reputation: 330093
It looks like you have an import conflict in your application most likely due to wildcard import from pyspark.sql.functions
:
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.6.1
/_/
Using Python version 2.7.10 (default, Oct 19 2015 18:04:42)
SparkContext available as sc, HiveContext available as sqlContext.
In [1]: max(1, 2)
Out[1]: 2
In [2]: from pyspark.sql.functions import max
In [3]: max(1, 2)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-bb133f5d83e9> in <module>()
----> 1 max(1, 2)
TypeError: _() takes exactly 1 argument (2 given)
Unless you work in a relatively limited it is best to either perfix:
from pyspark.sql import functions as sqlf
max(1, 2)
## 2
sqlf.max("foo")
## Column<max(foo)>
or alias:
from pyspark.sql.functions import max as max_
max(1, 2)
## 2
max_("foo")
## Column<max(foo)>
Upvotes: 10