Simon Breton
Simon Breton

Reputation: 2876

Why is alias not working with groupby and count

I'm running the following block and I'm wondering why .alias is not working:

data = [(1, "siva", 100), (2, "siva", 200),(3, "siva", 300),
        (4, "siva4", 400),(5, "siva5", 500)]
schema = ['id', 'name', 'sallary']

df = spark.createDataFrame(data, schema=schema)
df.show()
display(df.select('name').groupby('name').count().alias('test'))

Is there a specific reason? In which case .alias() is supposed to be working in a similar situation? Also why no errors are being returned?

Upvotes: 0

Views: 1018

Answers (1)

Pav3k
Pav3k

Reputation: 909

You could change syntax a bit to apply alias with no issue:

from pyspark.sql import functions as F

df.select('name').groupby('name').agg(F.count("name").alias("test")).show()

# output
+-----+----+
| name|test|
+-----+----+
|siva4|   1|
|siva5|   1|
| siva|   3|
+-----+----+

I am not 100% sure, but my understanding is that when you use .count() it returns entire Dataframe so in fact .alias() is applied to entire Dataset instead of single column that's why it does not work.

Upvotes: 1

Related Questions