Reputation:
I am new to Pyspark. I am trying to use alias for count function. For some reason if I use agg
in front of count
then alias
is working but if I am not aggregating then the alias
is giving me error.
.(count("firstName").alias("cnt"))
doesn't work;
.agg(count("firstName").alias("cnt"))
works.
I wanted to understand the issue with the 1st query.
Upvotes: 11
Views: 8840
Reputation: 1501
Count can be used as transformation as well as action.
when using .count() on a regular dataframe it will work as action and yield result.
when used as function inside filter, agg, select etc. we can alias this using .alias
we can do something like
order_items.filter('order_item_order_id = 2').select(count('order_item_quantity').alias('order_item_count'),sum('order_item_quantity').alias('order_quantity'),sum('order_item_subtotal').alias('order_revenue')).show()
When running count() on grouped dataframe then in order to alter the column name of the resultant column 'count' you can use grouppedDF.withColumnRenamed('count', 'new_count')
or companies_df.groupBy('sector').count().toDF("sector", "new count").show()
Upvotes: 0
Reputation: 196
You can try this:
.count().withColumnRenamed("count","cnt")
we cannot alias count function directly
Upvotes: 16