Reputation: 429
I've got a list comprehension that isn't sorting once I add the 'not in stop' method. Basically, the sorting function I had before is lost now when I include stopwords for this NLTK. Can anyone point out what I did wrong?
I have now included everything in the code for better reference.
EDITED:
from nltk import word_tokenize
from nltk.corpus import stopwords
import string
stop = stopwords.words('english') + list(string.punctuation)
f = open('review_text_all.txt', encoding="utf-8")
raw = f.read().lower().replace("'", "").replace("\\", "").replace(",",
"").replace("\ufeff", "")
tokens = nltk.word_tokenize(raw)
bgs = nltk.bigrams(tokens)
fdist = nltk.FreqDist(bgs)
for (k,v) in sorted(fdist.items(), key=lambda x: (x[1] not in stop),
reverse=True):
print(k,v)
Here is my result w/'not in stop'
('or', 'irish') 3
('put', 'one') 1
('was', 'repealed') 1
('please', '?') 6
('contact', 'your') 2
('wear', 'sweats') 1
without 'not in stop'
('white', 'people') 4362
('.', 'i') 3734
('in', 'the') 2880
('of', 'the') 2634
('to', 'be') 2217
('all', 'white') 1778
as you can see the sorted works, but only once I remove the 'not in stop'
Upvotes: 1
Views: 1148
Reputation: 80761
The key
parameter of the sorted method is a function that will let you tell python on which key (attribute/value related to the item of the list) to sort.
In your case, your function will return True or False.... which are not really good values to make a sort :)
EDIT:
from what I understand of what you want to achieve, you need to add before (or after) the sort a filter method that will remove from your list the items which are in your "stop words" list.
Something like this :
for (k,v) in sorted(filter(lambda x: (x[1] not in stop), fdist.items()), key=lambda x: x[1], reverse=True):
print(k,v)
Upvotes: 4