ANK
ANK

Reputation: 78

add a new column in pyspark dataframe based on matching values from a list

I am looking for help with pyspark on adding a new column with matching list values.

I have a list of values with variable unique_ids

[Row(card_id=1), Row(card_id=2)]

for each value in the list, if the list value matches column value, then count the number of rows that matches the value and add then create a new column with count value

this is how I am getting the list

unique_ids = data.select('card_id').distinct().collect()

example df

card_id
1
1
2
1
2
1

required dataframe

card_id Count
1 4
1 4
2 2
1 4
2 2
1 4

Thanks

Upvotes: 1

Views: 320

Answers (1)

AdibP
AdibP

Reputation: 2939

Use window function count

import pyspark.sql.functions as F
from pyspark.sql.window import Window

unique_ids = data.withColumn('count', F.count('card_id').over(Window.partitionBy('card_id')))
unique_ids.show()

+-------+-----+
|card_id|count|
+-------+-----+
|      1|    4|
|      1|    4|
|      1|    4|
|      1|    4|
|      2|    2|
|      2|    2|
+-------+-----+

Upvotes: 1

Related Questions