HimanshuSPaul
HimanshuSPaul

Reputation: 316

PySpark : What is difference between below two use of desc function in dataframe?

What is the difference between empDF["Last Name"].desc() and desc("Last Name") as both are giving same result and both involved shuffle operation

>>> empDF.orderBy(empDF["Last Name"].desc()).show(4)
+------+----------+---------+------+------+
|Emp ID|First Name|Last Name|Gender|Salary|
+------+----------+---------+------+------+
|977421|   Zackary|  Zumwalt|     M|177521|
|741150|    Awilda|    Zuber|     F|144972|
|274620|  Eleanora|     Zook|     F|151026|
|242757|      Erin|     Zito|     F|127254|
+------+----------+---------+------+------+
only showing top 4 rows

>>> empDF.orderBy(desc("Last Name")).show(4)
+------+----------+---------+------+------+
|Emp ID|First Name|Last Name|Gender|Salary|
+------+----------+---------+------+------+
|977421|   Zackary|  Zumwalt|     M|177521|
|741150|    Awilda|    Zuber|     F|144972|
|274620|  Eleanora|     Zook|     F|151026|
|242757|      Erin|     Zito|     F|127254|
+------+----------+---------+------+------+
only showing top 4 rows

One thing i noticed , to use desc() before column name i had to import from pyspark.sql.functions import desc . Is it like the former one is part of Spark Dataframe column function and later one is Spark SQL function ??? Is there any supporting doc or explanation for clarifying this confusion (i did not find any )???

Thanks in Advance.

Upvotes: 1

Views: 254

Answers (2)

HimanshuSPaul
HimanshuSPaul

Reputation: 316

After going through Documentation multipletimes i understand now .There are two desc() available in pyspark.sql.* module . One is in pyspark.sql.functions module (here) . This method takes a mandatory column argument. The Secnd one is inside pyspark.sql.Column class (here). This one does not take any argument .

Both implementation do almost same thing and same way. But implementation is different and can be used interchangeably with proper import statement.

Upvotes: 0

Som
Som

Reputation: 6323

Both are the same thing. As per documentation and source code (funtions.desc..)-

/**
   * Returns a sort expression based on the descending order of the column.
   * {{{
   *   df.sort(asc("dept"), desc("age"))
   * }}}
   *
   * @group sort_funcs
   * @since 1.3.0
   */
  def desc(columnName: String): Column = Column(columnName).desc

check internally desc(columnName) calls the Column(columnName).desc so both are same (take these as 2 alternatives performing the same operation)

Upvotes: 1

Related Questions