user11704694
user11704694

Reputation:

alias for count in Pyspark

I am new to Pyspark. I am trying to use alias for count function. For some reason if I use agg in front of count then alias is working but if I am not aggregating then the alias is giving me error.

.(count("firstName").alias("cnt"))

doesn't work;

.agg(count("firstName").alias("cnt"))

works.

I wanted to understand the issue with the 1st query.

Upvotes: 11

Views: 8840

Answers (2)

Manish Mehra
Manish Mehra

Reputation: 1501

Count can be used as transformation as well as action.

when using .count() on a regular dataframe it will work as action and yield result.

when used as function inside filter, agg, select etc. we can alias this using .alias

we can do something like

order_items.filter('order_item_order_id = 2').select(count('order_item_quantity').alias('order_item_count'),sum('order_item_quantity').alias('order_quantity'),sum('order_item_subtotal').alias('order_revenue')).show()

When running count() on grouped dataframe then in order to alter the column name of the resultant column 'count' you can use grouppedDF.withColumnRenamed('count', 'new_count') or companies_df.groupBy('sector').count().toDF("sector", "new count").show()

Upvotes: 0

C_codio
C_codio

Reputation: 196

You can try this:

.count().withColumnRenamed("count","cnt")

we cannot alias count function directly

Upvotes: 16

Related Questions