Dinosaurius
Dinosaurius

Reputation: 8628

AssertionError: all exprs should be Column

I join two PySpark DataFrames as follows:

exprs = [max(x) for x in ["col1","col2"]]
df = df1.union(df2).groupBy(['campk', 'ppk']).agg(*exprs)

But I get this error:

AssertionError: all exprs should be Column

What is wrong?

Upvotes: 9

Views: 29607

Answers (2)

Pankaj
Pankaj

Reputation: 151

try below code from pyspark.sql import functions as F exprs = [F.max(x) for x in ["col1","col2"]] print(*exprs)

Upvotes: 1

philantrovert
philantrovert

Reputation: 10082

exprs = [max(x) for x in ["col1","col2"]]

will return character with max ASCII value ie ['o', 'o']

Refering the correct max would work:

>>> from pyspark.sql import functions as F
>>> exprs = [F.max(x) for x in ["col1","col2"]]
>>> print(exprs)
[Column<max(col1)>, Column<max(col2)>]

Upvotes: 17

Related Questions