Reputation: 184
I want to Collect counts over the tokens. I want to see what is the most frequent token?, my code that I written does not work , so I commented my code. Can anyone help me with this problem?
! pip install wget
import wget
url = 'https://raw.githubusercontent.com/dirkhovy/NLPclass/master/data/moby_dick.txt'
wget.download(url, 'moby_dick.txt')
documents = [line.strip() for line in open('moby_dick.txt', encoding='utf8').readlines()]
print(documents[:])
import spacy
nlp = spacy.load('en')
tokens = [[token.text for token in nlp(sentence)] for sentence in documents[:200]]
tokens
# from collections import Counter
# Counter = Counter(tokens)
# most_occur = Counter.most_common(10)
# print(most_occur)
Upvotes: 0
Views: 228
Reputation: 5630
The code
tokens = [[token.text for token in nlp(sentence)] for sentence in documents[:200]]
creates a list of lists of tokens.
What you want is a list if tokens.
try:
import itertools
tokens = itertools.chain.from_iterable(
[[token.text for token in nlp(sentence)] for sentence in documents[:200]])
Upvotes: 1