Reputation: 4521
I have a large dataset of synonyms (10000+) as a list of tuples that looks like this:
data = [
(435347,'cat'),
(435347,'feline'),
(435347,'lion'),
(6765756,'dog'),
(6765756,'hound'),
(6765756,'puppy'),
(435347,'kitten'),
(987977,'frog')
]
where each synonym is identified by a arbitrary shared ID, in this case 435347
, 6765756
, and 987977
.
I would like to write a function that makes the data look like this:
processed_data = [
(435347,'cat','feline','lion','kitten'),
(6765756,'dog','hound','puppy'),
(987977,'frog')
]
Any suggestions are greatly appreciated!
Upvotes: 2
Views: 1596
Reputation:
There are many ways , some of them are :
Data is :
data = [
(435347,'cat'),
(435347,'feline'),
(435347,'lion'),
(6765756,'dog'),
(6765756,'hound'),
(6765756,'puppy'),
(435347,'kitten'),
(987977,'frog')
]
Itertools groupby :
from itertools import groupby
print([tuple(i) for j,i in groupby(sorted(data),key=lambda x:x[0])])
collection default dict:
from collections import defaultdict
d=defaultdict(list)
for i in data:
d[i[0]].append(i)
print(d)
without any module:
without_module={}
for i in data:
if i[0] not in without_module:
without_module[i[0]]=[i]
else:
without_module[i[0]].append(i)
print(without_module)
Upvotes: 0
Reputation: 13868
Dictionary might be a better suited solution for your problem:
data = [(435347,'cat'),(435347,'feline'),(435347,'lion'),(6765756,'dog'),(6765756,'hound'),(6765756,'puppy'),(435347,'kitten'),(987977,'frog')]
results = {}
for key, item in data:
results.setdefault(key,[]).append(item)
Output:
{435347: ['cat', 'feline', 'lion', 'kitten'],
987977: ['frog'],
6765756: ['dog', 'hound', 'puppy']}
setdefault
is a good candidate for your case. It basically creates a dictionary entry if the key doesn't exist, and appends to the entry if key exists.
Upvotes: 0
Reputation: 43504
Here is another approach which is a modification of my answer to another question. You can achieve this using reduce
and map
:
def reducer(x, y):
if isinstance(x, dict):
ykey, yval = y
if ykey not in x:
x[ykey] = [yval]
else:
x[ykey] += [yval]
return x
else:
xkey, xval = x
ykey, yval = y
a = {xkey: [xval]}
if ykey in a:
a[ykey] += [yval]
else:
a[ykey] = [yval]
return a
processed_data = map(lambda x: (x[0],) + tuple(x[1]), reduce(reducer, data).items())
Output:
>>> print processed_data
[(987977, 'frog'),
(435347, 'cat', 'feline', 'lion', 'kitten'),
(6765756, 'dog', 'hound', 'puppy')]
Explanation
Breaking it down step by step:
The function reducer()
is grouping items by key into a dictionary. The value of the dictionary is a list, which is appended with the synonym values.
>>> print(reduce(reducer, data))
{435347: ['cat', 'feline', 'lion', 'kitten'],
987977: ['frog'],
6765756: ['dog', 'hound', 'puppy']}
We call .items()
on the output of the reduce()
function to get this as a list of tuples
:
>>> print(reduce(reducer, data).items())
[(987977, ['frog']),
(435347, ['cat', 'feline', 'lion', 'kitten']),
(6765756, ['dog', 'hound', 'puppy'])]
Finally, we call map()
to transform this output into the form you want.
Upvotes: 0
Reputation: 7844
You can try this one:
data = [(435347,'cat'),(435347,'feline'),(435347,'lion'),(6765756,'dog'),(6765756,'hound'),(6765756,'puppy'),(435347,'kitten'),(987977,'frog')]
dataset = set(i[0] for i in data)
processed_data = sorted([(tuple([i]) + tuple(j[1] for j in data if j[0]==i)) for i in dataset])
print(processed_data)
Output:
[(435347, 'cat', 'feline', 'lion', 'kitten'), (987977, 'frog'), (6765756, 'dog', 'hound', 'puppy')]
Upvotes: 1
Reputation: 788
dictionary = {}
for val in data:
id_, name = val
if id_ in dictionary:
dictionary[id_].append(name)
else:
dictionary[id_] = [id_, name]
print(list(dictionary.values()))
>>> [[435347, 'cat', 'feline', 'lion', 'kitten'], [6765756, 'dog', 'hound', 'puppy'], [987977, 'frog']]
Upvotes: 1
Reputation:
Try this:
groups = {}
for x, y in data:
group = groups.get(x, [])
group.append(y)
groups[x] = group
print(groups)
Output:
{987977: ['frog'], 435347: ['cat', 'feline', 'lion', 'kitten'], 6765756: ['dog', 'hound', 'puppy']}
Upvotes: 5
Reputation: 24
Alright this is a SUGGESTION so don't be mad if it's wrong -
So try create a input and make a for statement and make it read the data from a .txt file or what you prefer. And create a if statement underneath the for.
Code:
animal=input("Animal: ")
f=open("animal.txt")
for line in f:
if genre in line.strip():
print(line)
Would suggest this personally and make the data all in a array and doing \n
Upvotes: -2