Reputation: 13
I have a problem in Python merging some tuples with items in common. The point is that I'd like to have just one tuple, but also preserve the different items, possibly by concatenating them in the position they have.
I'm analyzing ~100k captions connected to videos and specifically looking for verbs in them. I have a big list of tuples like this:
(verb, caption, video_id)
The point is that if in a caption there are more than one verb, in my list it is present more than one time:
list = [(verb1, caption, video_id), (verb2, caption, video_id), (verb3, caption, video_id)]
I would like to obtain this:
(verb1|verb2|verb3, caption, video_id)
(the |
is not mandatory, I simply would like to have all the 3 verbs in the first position of the tuple)
I need this because I'm outputting this to a csv file to be manually checked and I would like to avoid checking the same caption
and video_id
multiple times.
Here is a more reliable example:
list = [
('look', 'Mario takes the bag, looks around and runs away.','video_id_001'),
('run', 'Mario takes the bag, looks around and runs away.','video_id_001'),
('take', 'Mario takes the bag, looks around and runs away.','video_id_001')
]
Upvotes: 1
Views: 53
Reputation: 11070
I'm probably missing something, but how about:
list = [('look', 'Mario takes the bag, looks around and run away.','video_id_001'),('run', 'Mario takes the bag, looks around and run away.','video_id_001'),('take', 'Mario takes the bag, looks around and run away.','video_id_001')]
caption = "Mario takes the bag, looks around and run away"
vid_id = "video_id_001"
verbs = set()
for tup in list:
verbs.add(tup[0])
print(('|'.join(verbs), caption, vid_id))
Upvotes: 0
Reputation: 164773
collections.defaultdict
is your friend.
from collections import defaultdict
lst = [('verb1', 'caption', 'video_id'),
('verb2', 'caption', 'video_id'),
('verb3', 'caption', 'video_id')]
aggregator = defaultdict(list)
# first create a dictionary mapping (caption, video_id) -> verb
for i, j, k in lst:
aggregator[(j, k)].append(i)
# then reverse dictionary
result = [tuple(('|'.join(v), k[0], k[1])) for k, v in aggregator.items()]
# [('verb1|verb2|verb3', 'caption', 'video_id')]
Upvotes: 1