user3610141
user3610141

Reputation: 305

Python function such as max() doesn't work in pyspark application

Python function max(3,6) works under pyspark shell. But if it is put in an application and submit, it will throw an error: TypeError: _() takes exactly 1 argument (2 given)

Upvotes: 4

Views: 4792

Answers (2)

DeadLock
DeadLock

Reputation: 468

If you get this error even after verifying that you have NOT used from pyspark.sql.functions import *, then try the following:

Use import builtins as py_builtin And then correspondingly call it with the same prefix. Eg: py_builtin.max()

*Adding David Arenburg's and user3610141's comments as an answer, as that is what help me fix my problem in databricks where there was a name collision with min() and max() of pyspark with python built-ins.

Upvotes: 6

zero323
zero323

Reputation: 330093

It looks like you have an import conflict in your application most likely due to wildcard import from pyspark.sql.functions:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Python version 2.7.10 (default, Oct 19 2015 18:04:42)
SparkContext available as sc, HiveContext available as sqlContext.

In [1]: max(1, 2)
Out[1]: 2

In [2]: from pyspark.sql.functions import max

In [3]: max(1, 2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-bb133f5d83e9> in <module>()
----> 1 max(1, 2)

TypeError: _() takes exactly 1 argument (2 given)

Unless you work in a relatively limited it is best to either perfix:

from pyspark.sql import functions as sqlf

max(1, 2)
## 2

sqlf.max("foo")
## Column<max(foo)>

or alias:

from pyspark.sql.functions import max as max_

max(1, 2)
## 2

max_("foo")
## Column<max(foo)>

Upvotes: 10

Related Questions