James
James

Reputation: 1

Context most frequent words corpus in python

After using the following def for finding the 10 most used words in a corpus (using Python) I have to compare the context of these ten words in the different subcategories of said corpus.

def meest_freq(mycorpus):
    import string
    woorden = mycorpus.words()
    zonderhoofdletters = [word.lower() for word in woorden]
    filtered = [word for word in zonderhoofdletters if word not in stopList]
    no_punct = [s.translate(None, string.punctuation) for s in filtered]
    word_counter = {}
    D = defaultdict(int)
    for word in no_punct:
        D[word] +=1
    popular_words = sorted(D, key = D.get, reverse = True)
    woord1 = popular_words[1]
    woord2 = popular_words[2]
    woord3 = popular_words[3]
    woord4 = popular_words[4]
    woord5 = popular_words[5]
    woord6 = popular_words[6]
    woord7 = popular_words[7]
    woord8 = popular_words[8]
    woord9 = popular_words[9]
    woord10 = popular_words[10]
    print "De 10 meest frequente woorden zijn: ", woord1, ",", woord2, ',', woord3, ',', woord4, ',', woord5, ',', woord6, ',', woord7, ',', woord8, ',', woord9, "en", woord10
    return popular_words

I wanted to use the following code to do so:

def context(cat):
    words = popular_words[:10]
    context = words.concordance()
    print context

Unfortunately I keep getting "AttributeError: 'str' object has no attribute 'concordance' Does anyone know why I can't use the result of my first block of code in the second def? I thought by using a return-statement it should be able to work.

Upvotes: 0

Views: 759

Answers (1)

Karl Knechtel
Karl Knechtel

Reputation: 61526

Does anyone know why I can't use the result of my first block of code in the second def? I thought by using a return-statement it should be able to work.

Because functions do not return variables, they return values.

The popular_words that you use in context does not come from meest_freq; it comes from some global variable somewhere. Inside meest_freq, popular_words is a local. This is because of the rule: if you assign to a name inside a function, it's a local, unless you say otherwise with the global statement. In context, there is no assignment to popular_words, so Python looks for a global with that name, instead. This global contains something you don't expect it to, perhaps because you are testing the functions in the interpreter (maybe you have it left around from testing and fixing a previous version of the functions...).

Please do not try to use a global variable for this. You have already learned the lesson, correctly, that the way to get information out of a function is via the return value. The counterpart to this; the way to get information into the function is to pass it in, as a parameter. The same way that meest_freq knows about the corpus (because you passed it in as mycorpus), so context should be made aware of the popular words.

Somewhere you must have code that calls both of these functions. That code should take the value that was returned from meest_freq, and pass it to context, the same way that it passed the corpus to meest_freq.

Alternately, if you passed the corpus to context, then you could make the call inside there. It's hard to know what is the right way to organize things, because of your names; I have no idea what cat is supposed to mean, or what context has to do with anything, or what concordance means in this context.

Upvotes: 1

Related Questions