M4cJunk13
M4cJunk13

Reputation: 429

Sorted key value lambda not working

I've got a list comprehension that isn't sorting once I add the 'not in stop' method. Basically, the sorting function I had before is lost now when I include stopwords for this NLTK. Can anyone point out what I did wrong?

I have now included everything in the code for better reference.

EDITED:

from nltk import word_tokenize
from nltk.corpus import stopwords
import string

stop = stopwords.words('english') + list(string.punctuation)
f = open('review_text_all.txt', encoding="utf-8")
raw = f.read().lower().replace("'", "").replace("\\", "").replace(",", 
"").replace("\ufeff", "")

tokens = nltk.word_tokenize(raw)

bgs = nltk.bigrams(tokens)

fdist = nltk.FreqDist(bgs)
for (k,v) in sorted(fdist.items(), key=lambda x: (x[1] not in stop), 
reverse=True):
    print(k,v)

Here is my result w/'not in stop'

('or', 'irish') 3
('put', 'one') 1
('was', 'repealed') 1
('please', '?') 6
('contact', 'your') 2
('wear', 'sweats') 1

without 'not in stop'

('white', 'people') 4362
('.', 'i') 3734
('in', 'the') 2880
('of', 'the') 2634
('to', 'be') 2217
('all', 'white') 1778

as you can see the sorted works, but only once I remove the 'not in stop'

Upvotes: 1

Views: 1148

Answers (1)

Cédric Julien
Cédric Julien

Reputation: 80761

The key parameter of the sorted method is a function that will let you tell python on which key (attribute/value related to the item of the list) to sort.

In your case, your function will return True or False.... which are not really good values to make a sort :)

EDIT:

from what I understand of what you want to achieve, you need to add before (or after) the sort a filter method that will remove from your list the items which are in your "stop words" list.

Something like this :

for (k,v) in sorted(filter(lambda x: (x[1] not in stop), fdist.items()), key=lambda x: x[1], reverse=True):
    print(k,v)

Upvotes: 4

Related Questions