Apply function over Spark dataset

Question

Assume I have this in pyspark:

data = [{"age":1,"count":10},{"age":2,"count":20},{"age":3,"count":30}]

rdd = sc.parallelize( data )

I want to make "count" + 10 if "age" is larger than 2. Like this :

data = [{"age":1,"count":10},{"age":2,"count":20},{"age":3,"count":40}]

How to achieve this?

mursalin · Accepted Answer

There might be better solution. This one works for me.

def add_count(x):
    x['count']+=10
    return x
    
new_data = list(map(lambda x: x if x['age']<=2 else add_count(x), data))

Answers (2)