max
max

Reputation: 4521

Group items in a list of tuples based on a common ID

I have a large dataset of synonyms (10000+) as a list of tuples that looks like this:

data = [
    (435347,'cat'),
    (435347,'feline'),
    (435347,'lion'),
    (6765756,'dog'),
    (6765756,'hound'),
    (6765756,'puppy'),
    (435347,'kitten'),
    (987977,'frog')
]

where each synonym is identified by a arbitrary shared ID, in this case 435347, 6765756, and 987977.

I would like to write a function that makes the data look like this:

processed_data = [
    (435347,'cat','feline','lion','kitten'),
    (6765756,'dog','hound','puppy'),
    (987977,'frog')
]

Any suggestions are greatly appreciated!

Upvotes: 2

Views: 1596

Answers (7)

user9158931
user9158931

Reputation:

There are many ways , some of them are :

Data is :

data = [
    (435347,'cat'),
    (435347,'feline'),
    (435347,'lion'),
    (6765756,'dog'),
    (6765756,'hound'),
    (6765756,'puppy'),
    (435347,'kitten'),
    (987977,'frog')
]

Itertools groupby :

from itertools import groupby

print([tuple(i) for j,i in groupby(sorted(data),key=lambda x:x[0])])

collection default dict:

from collections import defaultdict

d=defaultdict(list)
for i in data:
    d[i[0]].append(i)

print(d)

without any module:

without_module={}
for i in data:
    if i[0] not in without_module:
        without_module[i[0]]=[i]
    else:
        without_module[i[0]].append(i)
print(without_module)

Upvotes: 0

r.ook
r.ook

Reputation: 13868

Dictionary might be a better suited solution for your problem:

data = [(435347,'cat'),(435347,'feline'),(435347,'lion'),(6765756,'dog'),(6765756,'hound'),(6765756,'puppy'),(435347,'kitten'),(987977,'frog')]
results = {}
for key, item in data:
    results.setdefault(key,[]).append(item)

Output:

{435347: ['cat', 'feline', 'lion', 'kitten'],
 987977: ['frog'],
 6765756: ['dog', 'hound', 'puppy']}

setdefault is a good candidate for your case. It basically creates a dictionary entry if the key doesn't exist, and appends to the entry if key exists.

Upvotes: 0

pault
pault

Reputation: 43504

Here is another approach which is a modification of my answer to another question. You can achieve this using reduce and map:

def reducer(x, y):
    if isinstance(x, dict):
        ykey, yval = y
        if ykey not in x:
            x[ykey] = [yval]
        else:
            x[ykey] += [yval]
        return x
    else:
        xkey, xval = x
        ykey, yval = y
        a = {xkey: [xval]}
        if ykey in a:
            a[ykey] += [yval]
        else:
            a[ykey] = [yval]
        return a

processed_data = map(lambda x: (x[0],) + tuple(x[1]), reduce(reducer, data).items())

Output:

>>> print processed_data
[(987977, 'frog'),
 (435347, 'cat', 'feline', 'lion', 'kitten'),
 (6765756, 'dog', 'hound', 'puppy')]

Explanation

Breaking it down step by step:

The function reducer() is grouping items by key into a dictionary. The value of the dictionary is a list, which is appended with the synonym values.

>>> print(reduce(reducer, data))
{435347: ['cat', 'feline', 'lion', 'kitten'],
 987977: ['frog'],
 6765756: ['dog', 'hound', 'puppy']}

We call .items() on the output of the reduce() function to get this as a list of tuples:

>>> print(reduce(reducer, data).items())
[(987977, ['frog']),
 (435347, ['cat', 'feline', 'lion', 'kitten']),
 (6765756, ['dog', 'hound', 'puppy'])]

Finally, we call map() to transform this output into the form you want.

Upvotes: 0

Vasilis G.
Vasilis G.

Reputation: 7844

You can try this one:

data = [(435347,'cat'),(435347,'feline'),(435347,'lion'),(6765756,'dog'),(6765756,'hound'),(6765756,'puppy'),(435347,'kitten'),(987977,'frog')]

dataset = set(i[0] for i in data)
processed_data = sorted([(tuple([i]) + tuple(j[1] for j in data if j[0]==i)) for i in dataset])
print(processed_data)

Output:

[(435347, 'cat', 'feline', 'lion', 'kitten'), (987977, 'frog'), (6765756, 'dog', 'hound', 'puppy')]

Upvotes: 1

Veera Balla Deva
Veera Balla Deva

Reputation: 788

dictionary = {}
for val in data:
    id_, name = val
    if id_ in dictionary:
        dictionary[id_].append(name)
    else:
        dictionary[id_] = [id_, name]
print(list(dictionary.values()))
>>> [[435347, 'cat', 'feline', 'lion', 'kitten'], [6765756, 'dog', 'hound', 'puppy'], [987977, 'frog']]

Upvotes: 1

user8098055
user8098055

Reputation:

Try this:

groups = {}

for x, y in data:
    group = groups.get(x, [])
    group.append(y)
    groups[x] = group

print(groups)

Output:

{987977: ['frog'], 435347: ['cat', 'feline', 'lion', 'kitten'], 6765756: ['dog', 'hound', 'puppy']}

Upvotes: 5

Toxic Preys
Toxic Preys

Reputation: 24

Alright this is a SUGGESTION so don't be mad if it's wrong -

So try create a input and make a for statement and make it read the data from a .txt file or what you prefer. And create a if statement underneath the for.

Code:

animal=input("Animal: ")
f=open("animal.txt")
for line in f:
    if genre in line.strip():
        print(line)

Would suggest this personally and make the data all in a array and doing \n

Upvotes: -2

Related Questions