Reputation: 143
Hope u can help me w/ this python function:
def comparapal(lista):#lista is a list of lists where each list has 4 elements
listaPalabras=[]
for item in lista:
if item[2] in eagles_dict.keys():# filter the list if the 3rd element corresponds to the key in the dictionary
listaPalabras.append([item[1],item[2]]) #create a new list with elements 2 and 3
The listaPalabras result:
[
['bien', 'NP00000'],
['gracia', 'NCFP000'],
['estar', 'VAIP1S0'],
['bien', 'RG'],
['huevo', 'NCMS000'],
['calcio', 'NCMS000'],
['leche', 'NCFS000'],
['proteina', 'NCFS000'],
['francisco', 'NP00000'],
['ya', 'RG'],
['ser', 'VSIS3S0'],
['cosa', 'NCFS000']
]
My question is: How can I compare the 1st element of each list so that if the word is the same, compare their tags which is the 2nd element.
Sorry for being ambiguous, the fuunction has to return a list of lists w/ 3 elements: the word, the tag and the number of occurrences of each word. But in order to count the words I need to compare the word w/ others and if there exists 2 or more words alike, then compare the tags to chk the difference. If the tags are different then count the words separately.
result -> [['bien', 'NP00000',1],['bien', 'RG',1]] -> two same words but counted separately by the comparison of the tags Thanks in advance:
Upvotes: 0
Views: 8583
Reputation: 49816
import collections
inlist = [
['bien', 'NP00000'],
['gracia', 'NCFP000'],
['estar', 'VAIP1S0'],
['bien', 'RG'],
['huevo', 'NCMS000'],
['calcio', 'NCMS000'],
['leche', 'NCFS000'],
['proteina', 'NCFS000'],
['francisco', 'NP00000'],
['ya', 'RG'],
['ser', 'VSIS3S0'],
['cosa', 'NCFS000']
]
[(a,b,v) for (a,b),v in collections.Counter(map(tuple,inlist)).iteritems()]
#=>[('proteina', 'NCFS000', 1), ('francisco', 'NP00000', 1), ('ser', 'VSIS3S0', 1), ('bien', 'NP00000', 1), ('calcio', 'NCMS000', 1), ('estar', 'VAIP1S0', 1), ('huevo', 'NCMS000', 1), ('gracia', 'NCFP000', 1), ('bien', 'RG', 1), ('cosa', 'NCFS000', 1), ('ya', 'RG', 1), ('leche', 'NCFS000', 1)]
You want to count the number of occurrences of each pair. The counter
expression does that. The list comprehension formats this as triples.
Upvotes: 2
Reputation: 7822
What specific output do you need? I don't know what exactly do you need to do, but if you want to group items related to same word, you can turn this structure into dictionary and manipulate it later
>>> new = {}
>>> for i,j in a: # <-- a = listaPalabras
if new.get(i) == None:
new[i] = [j]
else:
new[i].append(j)
which will give us:
{'francisco': ['NP00000'], 'ser': ['VSIS3S0'], 'cosa': ['NCFS000'], 'ya': ['RG'], 'bien': ['NP00000', 'RG'], 'estar': ['VAIP1S0'], 'calcio': ['NCMS000'], 'leche': ['NCFS000'], 'huevo': ['NCMS000'], 'gracia': ['NCFP000'], 'proteina': ['NCFS000']}
and then later on you can do:
>>> for i in new:
if len(new[i]) > 1:
print "compare {this} and {that}".format(this=new[i][0],that=new[i][1])
will print:
compare NP00000 and RG #for key bien
EDIT: In the first step, you can also use defaultdict, as suggested by Marcin in the comment, this would look like this:
>>> d = defaultdict(list)
>>> for i,j in a:
d.setdefault(i,[]).append(j)
EDIT2 (answer to OP's comment)
for i in d:
item = []
item.append(i)
item.extend(d[i])
item.append(len(d[i]))
result.append(item)
This gives us:
[['francisco', 'NP00000', 1], ['ser', 'VSIS3S0', 1], ['cosa', 'NCFS000', 1], ['ya', 'RG', 1], ['bien', 'NP00000', 'RG', 2], ['estar', 'VAIP1S0', 1], ['calcio', 'NCMS000', 1], ['leche', 'NCFS000', 1], ['huevo', 'NCMS000', 1], ['gracia', 'NCFP000', 1], ['proteina', 'NCFS000', 1]]
Upvotes: 1
Reputation: 571
A purely list-based solution is possible of course, but requires additional looping. If efficiency is important, it might be better to replace listaPalabras
with a dict.
def comparapal(lista):
listaPalabras=[]
for item in lista:
if item[2] in eagles_dict.keys():
listaPalabras.append([item[1],item[2]])
last_tt = [None, None]
for tt in sorted(listaPalabras):
if tt == last_tt:
print "Observed %s twice" % tt
elif tt[0] == last_tt[0]:
print "Observed %s and %s" % (tt, last_tt)
last_tt = tt
This gives you:
Observed ['bien', 'RG'] and ['bien', 'NP00000']
If this does not suit your purposes, please specify your question.
Upvotes: 0