Reputation: 78
I am looking for help with pyspark on adding a new column with matching list values.
I have a list of values with variable unique_ids
[Row(card_id=1), Row(card_id=2)]
for each value in the list, if the list value matches column value, then count the number of rows that matches the value and add then create a new column with count value
this is how I am getting the list
unique_ids = data.select('card_id').distinct().collect()
example df
card_id |
---|
1 |
1 |
2 |
1 |
2 |
1 |
required dataframe
card_id | Count |
---|---|
1 | 4 |
1 | 4 |
2 | 2 |
1 | 4 |
2 | 2 |
1 | 4 |
Thanks
Upvotes: 1
Views: 320
Reputation: 2939
Use window function count
import pyspark.sql.functions as F
from pyspark.sql.window import Window
unique_ids = data.withColumn('count', F.count('card_id').over(Window.partitionBy('card_id')))
unique_ids.show()
+-------+-----+
|card_id|count|
+-------+-----+
| 1| 4|
| 1| 4|
| 1| 4|
| 1| 4|
| 2| 2|
| 2| 2|
+-------+-----+
Upvotes: 1