Reputation: 67
I am kinda of new in the Spark Environment and Development.
I have two RDDs in which I merge via a joiner, the result of that joiner is the following:
(u'10611', ((u'Laura', u'Mcgee'), (u'66821', u'COMPLETE')))
(u'4026', ((u'Mary', u'Smith'), (u'3237', u'COMPLETE')))
(u'4026', ((u'Mary', u'Smith'), (u'4847', u'CLOSED')))
If you see I have two tuples and a key, I want to merge both tuples and leave it as key and one tuple, like the following:
(u'10611', (u'Laura', u'Mcgee', u'66821', u'COMPLETE'))
(u'4026', (u'Mary', u'Smith', u'3237', u'COMPLETE'))
(u'4026', (u'Mary', u'Smith', u'4847', u'CLOSED'))
Also how can I format this before saveAsTextFile, delimited by Tab. Example
10611 Laura Mcgee 66821 COMPLETE
4026 Mary Smith 3237 COMPLETE
4026 Mary Smith 4847 CLOSED
I have something like this, but not sure how to access it with the tuple:
.map(lambda x: "%s\t%s\t%s\t%s" %(x[0], x[1], x[2], x[3]))
Upvotes: 0
Views: 1566
Reputation: 9257
You can also use list/tuple comprehension to do it like this example:
my_tuple = (u'10611', ((u'Laura', u'Mcgee'), (u'66821', u'COMPLETE')))
new_tuple = (my_tuple[0], tuple(j for k in my_tuple[1] for j in k))
Output:
print(new_tuple)
>>> ('10611', ('Laura', 'Mcgee', '66821', 'COMPLETE'))
Then, to format your output you can do something like this too:
print("{0}\t{1}" .format(new_tuple[0], "\t".join(k for k in new_tuple[1])))
Output:
>>> 10611 Laura Mcgee 66821 COMPLETE
Upvotes: 2
Reputation: 19621
Assuming your data is consistently formatted you can merge your tuples with a simple addition operator...
>>> weird = (u'10611', ((u'Laura', u'Mcgee'), (u'66821', u'COMPLETE')))
>>> weirdMerged = (weird[0], (weird[1][0]+weird[1][1]))
>>> weirdMerged
(u'10611', (u'Laura', u'Mcgee', u'66821', u'COMPLETE'))
Outputting to text should be simple, but your oddball structure makes it a little odd also. Your lambda isn't bad but you could also do:
>>> print('\t'.join((weirdMerged[0],)+weirdMerged[1]))
10611 Laura Mcgee 66821 COMPLETE
I'm not sure that's much better, but it works.
Upvotes: 2