Why is alias not working with groupby and count

Question

I'm running the following block and I'm wondering why .alias is not working:

data = [(1, "siva", 100), (2, "siva", 200),(3, "siva", 300),
        (4, "siva4", 400),(5, "siva5", 500)]
schema = ['id', 'name', 'sallary']

df = spark.createDataFrame(data, schema=schema)
df.show()
display(df.select('name').groupby('name').count().alias('test'))

Is there a specific reason? In which case .alias() is supposed to be working in a similar situation? Also why no errors are being returned?

Pav3k · Accepted Answer

You could change syntax a bit to apply alias with no issue:

from pyspark.sql import functions as F

df.select('name').groupby('name').agg(F.count("name").alias("test")).show()

# output
+-----+----+
| name|test|
+-----+----+
|siva4|   1|
|siva5|   1|
| siva|   3|
+-----+----+

I am not 100% sure, but my understanding is that when you use .count() it returns entire Dataframe so in fact .alias() is applied to entire Dataset instead of single column that's why it does not work.

Why is alias not working with groupby and count

Answers (1)

Related Questions