Jectson
Jectson

Reputation: 79

List with tuples in python

I'm new to Python and have some questions about lists and tuples. I've got a list consisting of tuples with sentences and wordclass-tags. This is one element in my list:

[('It', 'PPS'), ('says', 'VBZ'), ('that', 'CS'), ('``', '``'), ('in', 'IN'), ('the', 'AT'), ('event', 'NN'), ('Congress', 'NP'), ('does', 'DOZ'), ('provide', 'VB'), ('this', 'DT'), ('increase', 'NN'), ('in', 'IN'), ('federal', 'JJ'), ('funds', 'NNS'), ("''", "''"), (',', ','), ('the', 'AT'), ('State', 'NN-TL'), ('Board', 'NN-TL'), ('of', 'IN-TL'), ('Education', 'NN-TL'), ('should', 'MD'), ('be', 'BE'), ('directed', 'VBN'), ('to', 'TO'), ('``', '``'), ('give', 'VB'), ('priority', 'NN'), ("''", "''"), ('to', 'IN'), ('teacher', 'NN'), ('pay', 'NN'), ('raises', 'NNS'), ('.', '.')]

As you can see each word has a wordclass-tag. How can I search after word + wordclass in my list? F.ex. if I would like to see if the element about contains the word "federal" attached to the wordclass-tag "JJ" ?

Help is much appreciated

Upvotes: 1

Views: 198

Answers (3)

piokuc
piokuc

Reputation: 26184

To check if you have the word 'federal' tagged with 'JJ' on your list:

your_list = [('It', 'PPS'), ('says', 'VBZ'), ('that', 'CS'), ('``', '``'), ('in', 'IN'), ('the', 'AT'), ('event', 'NN'), ('Congress', 'NP'), ('does', 'DOZ'), ('provide', 'VB'), ('this', 'DT'), ('increase', 'NN'), ('in', 'IN'), ('federal', 'JJ'), ('funds', 'NNS'), ("''", "''"), (',', ','), ('the', 'AT'), ('State', 'NN-TL'), ('Board', 'NN-TL'), ('of', 'IN-TL'), ('Education', 'NN-TL'), ('should', 'MD'), ('be', 'BE'), ('directed', 'VBN'), ('to', 'TO'), ('``', '``'), ('give', 'VB'), ('priority', 'NN'), ("''", "''"), ('to', 'IN'), ('teacher', 'NN'), ('pay', 'NN'), ('raises', 'NNS'), ('.', '.')]
print ('federal', 'JJ') in your_list

Using list comprehension syntax you can do more interesting things with your list, for example see all tags of all occurrences of a word:

print " ".join([wordclass for word, wordclass in your_list if word == 'federal'])

It's good to build some functions doing generic operations on the data structure you work with, like checking if it contains a word or tag:

def hasWord(l, word):
    for w, wordclass in l:
        if w == word:
            return True
    return False

def hasTag(l, tag):
    for w, wordclass in l:
        if wordclass == tag:
            return True
    return False

if hasTag(your_list, 'JJ'): print your_list

To answer your question in the comments:

for sentence in sentences:
    if ('federal', 'JJ') in sentence:
        print sentence

Upvotes: 1

SaCry
SaCry

Reputation: 114

My first approach was:

def find_tuple(input, l):
    for (e1, e2) in l:
        if e1==input[0] and e2==input[1]:
            return True
    return False

It is straight forward but static and only suitable to your problem. A more general but equal approach:

def my_any(iterable, input, func):
    for element in iterable:
        if func(element, input):
            return True
    return False

input = ("federal","JJ")
l = [("It", "PPS"),("federal","JJ")]
print(my_any(l, input, lambda x, y: x[0]==y[0] and x[1]==y[1]))

Pass in a lambda function to decide for yourself what boolean matching you prefer. And an easy approach to this would be this:

input = ("federal","JJ")
l = [("It", "PPS"),("federal","JJ")]
if input in l:
    print("True")

If you would be more specific on the problem you like to solve it would be easier to give a concrete advice. (i.e.: What is your returning type: Boolean/String/Tuple..?) Hope this helps.

Cheers!

Upvotes: 0

nneonneo
nneonneo

Reputation: 179422

I would use a set instead. Then you can use the in operator efficiently:

wlist = set([('It', 'PPS'), ('says', 'VBZ'), ('that', 'CS'), ('``', '``'), ('in', 'IN'), ('the', 'AT'), ('event', 'NN'), ('Congress', 'NP'), ('does', 'DOZ'), ('provide', 'VB'), ('this', 'DT'), ('increase', 'NN'), ('in', 'IN'), ('federal', 'JJ'), ('funds', 'NNS'), ("''", "''"), (',', ','), ('the', 'AT'), ('State', 'NN-TL'), ('Board', 'NN-TL'), ('of', 'IN-TL'), ('Education', 'NN-TL'), ('should', 'MD'), ('be', 'BE'), ('directed', 'VBN'), ('to', 'TO'), ('``', '``'), ('give', 'VB'), ('priority', 'NN'), ("''", "''"), ('to', 'IN'), ('teacher', 'NN'), ('pay', 'NN'), ('raises', 'NNS'), ('.', '.')])

print ('federal', 'JJ') in wlist # prints True

Upvotes: 2

Related Questions