Reputation: 5389
My data after left outer join is in the following format:
# (u'session_id', ((u'prod_id', u'user_id'), (u'prod_label', u'user_id')))
# (u'u'session_id', ((u'20133', u'129001032'), None))
# (u'u'session_id', ((u'2024574', u'61370212'), (u'Loc1', u'61370212')))
I want data in the following format now: (user_id, prod_id, prod_label)
When I do this to get that, I get the following error:
result_rdd = rdd1.map(lambda (session_id, (prod_id, user_id), (prod_label, user_id)): user_id, prod_id, prod_label)
NameError: global name 'prod_id' is not defined
Upvotes: 0
Views: 4265
Reputation: 330073
It is simply not a valid syntax for lambda expression. If you want to return a tuple it has to be done with full parentheses:
rdd1.map(lambda (session_id, (prod_id, user_id_1), (prod_label, user_id_2)):
(user_id, prod_id, prod_label))
Also keep in mind that tuple parameter unpacking is not portable and that duplicate parameter names are not allowed and will result in `SyntaxError.
Upvotes: 2