Chetan Hirapara
Chetan Hirapara

Reputation: 694

How order by with IN clause is works in pyspark?

I am running below query in spark.sql which has IN clause with order by statement

Here is my original data:

enter image description here

Query:

SELECT 
    *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY category IN ('TV','laptop'), created_at desc) as rn 
FROM table2

Result:

enter image description here

Can you help me to interpret this result? I have added 'TV' and 'laptop' in IN clause then it should be at top row (rn = 1 to 4)

I have observed same behaviour in Snowflake as well. After adding IN ('TV','laptop') in the query getting this unpredicted result.

Upvotes: 2

Views: 58

Answers (1)

Chetan Hirapara
Chetan Hirapara

Reputation: 694

I have played around this much and then got the logic is when we use IN clause it is return binary result 0 or 1

When I have added IN ('TV','laptop') into the query. It will check all the result which has this category and assigned it 1 to those records rest will be 0. Now based on this binary all the 0 records will come first and the records with 1 due to it is ASC by default.

enter image description here

After removing 'laptop' from query, all the records with TV will be ordered at the last due to TV ones are set to 1 rest will be set to 0

enter image description here

I hope you will get clear understanding from this description.

Upvotes: 2

Related Questions