Reputation: 311
I would like to find all the possible combination within an Iterable object.
My input is
Object1|DrDre|1.0
Object1|Plane and a Disaster|2.0
Object1|Tikk Takk Tikk|3.5
Object1|Tennis Dope|5.0
Object2|DrDre|11.0
Object2|Plane and a Disaster|14.0
Object2|Just My Luck|2.0
Object2|Tennis Dope|45.0
The expected output would be something like this:
[(('DrDre', 'Plane and a Disaster'), (11.0, 14.0, 1.0, 2.0)),
(('DrDre', 'Tikk Takk Tikk'), (1.0, 3.5)),
(('DrDre', 'Tennis Dope'), (11.0, 45.0, 1.0, 5.0)),
(('Plane and a Disaster', 'Tikk Takk Tikk'), (2.0, 3.5)),
(('Plane and a Disaster', 'Tennis Dope'), (14.0, 45.0, 2.0, 5.0)),
(('Tikk Takk Tikk', 'Tennis Dope'), (3.5, 45.0)),
(('DrDre', 'Just My Luck'), (11.0, 2.0)),
(('Plane and a Disaster', 'Just My Luck'), (14.0, 2.0)),
(('Just My Luck', 'Tennis Dope'), (2.0, 45.0))]
This is my current code, which does not give me the right combinations in the end.
def iterate(iterable):
r = []
for v1_iterable in iterable:
for v2 in v1_iterable:
r.append(v2)
return tuple(r)
def parseVector(line):
'''
Parse each line of the specified data file, assuming a "|" delimiter.
Converts each rating to a float
'''
line = line.split("|")
return line[0],(line[1],float(line[2]))
def FindPairs(object_id,items_with_usage):
'''
For each objects, find all item-item pairs combos. (i.e. items with the same user)
'''
for item1,item2 in combinations(items_with_usage,2):
return (item1[0],item2[0]),(item1[1],item2[1])
'''
Obtain the sparse object-item matrix:
user_id -> [(object_id_1, rating_1),
[(object_id_2, rating_2),
...]
'''
object_item_pairs = lines.map(parseVector).groupByKey().map(
lambda p: sampleInteractions(p[0],p[1],500)).cache()
'''
Get all item-item pair combos:
(item1,item2) -> [(item1_rating,item2_rating),
(item1_rating,item2_rating),
...]
'''
pairwise_objects = object_item_pairs.filter(
lambda p: len(p[1]) > 1).map(
lambda p: findItemPairs(p[0],p[1])).groupByKey()
x = pairwise_objects.mapValues(iterate)
x.collect()
This only gives me back the first pair, and nothing else.
[(('DrDre', 'Plane and a Disaster'), (11.0, 14.0, 1.0, 2.0))]
Did I misunderstand the functionality of the combinations() function?
Thanks for your inputs
Upvotes: 0
Views: 604
Reputation: 2108
I think you can transform your FindPairs in this way
def FindPairs(object_id,items_with_usage):
'''
For each objects, find all item-item pairs combos. (i.e. items with the same user)
'''
t = []
for item1,item2 in combinations(items_with_usage,2):
t.append(((item1[0],item2[0]),(item1[1],item2[1])))
return t
Now, your function will return a list with all the pairs of the combination.
Then
pairwise_objects= pairwise_objects.filter(lambda p: len(p[1]) > 1)
pairwise_objects= pairwise_objects.map(lambda p: FindPairs(p[0],p[1]))
[[(('DrDre', 'Plane and a Disaster'), (1.0, 2.0)),
(('DrDre', 'Tikk Takk Tikk'), (1.0, 3.5)),
(('DrDre', 'Tennis Dope'), (1.0, 5.0)),
(('Plane and a Disaster', 'Tikk Takk Tikk'), (2.0, 3.5)),
(('Plane and a Disaster', 'Tennis Dope'), (2.0, 5.0)),
(('Tikk Takk Tikk', 'Tennis Dope'), (3.5, 5.0))], # end of the first line of the RDD
[(('DrDre', 'Plane and a Disaster'),(11.0, 14.0)),
(('DrDre', 'Just My Luck'), (11.0, 2.0)),
(('DrDre', 'Tennis Dope'), (11.0, 45.0)),
(('Plane and a Disaster', 'Just My Luck'), (14.0, 2.0)),
(('Plane and a Disaster', 'Tennis Dope'), (14.0, 45.0)),
(('Just My Luck', 'Tennis Dope'), (2.0, 45.0))]]
Use flatMap (so you will have a single line with all of your pairs) before grouping your RDD and applying your function
pairwise_objects=pairwise_objects.flatMap(lambda p: p).groupByKey().mapValues(iterate)
Final output:
[(('DrDre', 'Tennis Dope'), (1.0, 5.0, 11.0, 45.0)),
(('DrDre', 'Plane and a Disaster'), (1.0, 2.0, 11.0, 14.0)),
(('Plane and a Disaster', 'Tennis Dope'), (2.0, 5.0, 14.0, 45.0)),
(('Plane and a Disaster', 'Just My Luck'), (14.0, 2.0)),
(('Plane and a Disaster', 'Tikk Takk Tikk'), (2.0, 3.5)),
(('DrDre', 'Tikk Takk Tikk'), (1.0, 3.5)),
(('Tikk Takk Tikk', 'Tennis Dope'), (3.5, 5.0)),
(('DrDre', 'Just My Luck'), (11.0, 2.0)),
(('Just My Luck', 'Tennis Dope'), (2.0, 45.0))]
Upvotes: 1