Reputation: 2760
I have a RDD like this:
[('anger', 166),
('lyon', 193),
('marseilles_1', 284),
('nice', 203),
('paris_2', 642),
('paris_3', 330),
('troyes', 214),
('marseilles_2', 231),
('nantes', 207),
('orlean', 196),
('paris_1', 596),
('rennes', 180),
('toulouse', 177)]
I need to merge paris_1
, paris_2
, paris_3
into one row called paris
.
I strictly have no idea how to proceed and didn't find any answers.
Can you help me?
Upvotes: 0
Views: 56
Reputation: 10096
You can use a regular expression to get city names from your current key values, then reduce by key:
import re
rdd\
.map(lambda l: (re.sub('[_0-9]', '',l[0]), l[1]))\
.reduceByKey(lambda x,y: x + y)\
[('anger', 166),
('lyon', 193),
('nice', 203),
('paris', 1568),
('troyes', 214),
('marseilles', 515),
('nantes', 207),
('orlean', 196),
('rennes', 180),
('toulouse', 177)]
Upvotes: 2