Reputation: 5907
I have got a rdd by reading a mongodb collection and Now i want to change some value and update/load that data back to same or other collections.
mr1 = sc.mongoRDD('mongodb://localhost:27017/test_database.test2')
type(mr1) #<class 'pyspark.rdd.PipelinedRDD'>
mr1.collect()
#[{u'_id': ObjectId('58089490d7531cd8b071f48c'), u'name': u'ravi', u'sal': u'2000'}, {u'_id': ObjectId('58089491d7531cd8b071f48d'), u'name': u'ravi', u'sal': u'3000'}]
#I want to change the name 'ravi' to 'Satya'
mr2 = mr1.map( lambda x: x['name'].replace('ravi','SATYA'))
#o/p: [u'SATYA', u'SATYA'] ##not all values
#Expected: [{u'_id': ObjectId('58089490d7531cd8b071f48c'), u'name': u'SATYA', u'sal': u'2000'}, {u'_id': ObjectId('58089491d7531cd8b071f48d'), u'name': u'SATYA', u'sal': u'3000'}]
Please help, how to apply a map function here to get back the same rdd mr1 with names replaced.
Thanks.
Upvotes: 2
Views: 7708
Reputation:
Try:
def replace(x, key, fr, to):
d = x.copy()
if key in d:
d[key] = d[key].replace('ravi','SATYA')
return d
mr1.map(lambda x: replace(x, 'name', 'ravi','SATYA'))
Upvotes: 3
Reputation: 5907
Got it worked-
def rep(x):
if x['name'] == 'ravi':
x['name']='SATYA'
return x
mr2 = mr1.map(lambda x: rep(x))
Upvotes: 3