merge tuples in Python if the have elements in common and concatenate all the different elements in their position

Question

I have a problem in Python merging some tuples with items in common. The point is that I'd like to have just one tuple, but also preserve the different items, possibly by concatenating them in the position they have.

I'm analyzing ~100k captions connected to videos and specifically looking for verbs in them. I have a big list of tuples like this:

(verb, caption, video_id)

The point is that if in a caption there are more than one verb, in my list it is present more than one time:

list = [(verb1, caption, video_id), (verb2, caption, video_id), (verb3, caption, video_id)]

I would like to obtain this:

(verb1|verb2|verb3, caption, video_id)

(the | is not mandatory, I simply would like to have all the 3 verbs in the first position of the tuple)

I need this because I'm outputting this to a csv file to be manually checked and I would like to avoid checking the same caption and video_id multiple times.

Here is a more reliable example:

list = [
  ('look', 'Mario takes the bag, looks around and runs away.','video_id_001'),
  ('run',  'Mario takes the bag, looks around and runs away.','video_id_001'),
  ('take', 'Mario takes the bag, looks around and runs away.','video_id_001')
]

jpp · Accepted Answer

collections.defaultdict is your friend.

from collections import defaultdict

lst = [('verb1', 'caption', 'video_id'),
       ('verb2', 'caption', 'video_id'),
       ('verb3', 'caption', 'video_id')]

aggregator = defaultdict(list)

# first create a dictionary mapping (caption, video_id) -> verb
for i, j, k in lst:
    aggregator[(j, k)].append(i)

# then reverse dictionary
result = [tuple(('|'.join(v), k[0], k[1])) for k, v in aggregator.items()]

# [('verb1|verb2|verb3', 'caption', 'video_id')]

merge tuples in Python if the have elements in common and concatenate all the different elements in their position

Answers (2)

Related Questions