I want to collect counts over the tokens. and see what is the most frequent token?, my code that I written does not work , so I commented my code

Question

I want to Collect counts over the tokens. I want to see what is the most frequent token?, my code that I written does not work , so I commented my code. Can anyone help me with this problem?

! pip install wget

import wget
url = 'https://raw.githubusercontent.com/dirkhovy/NLPclass/master/data/moby_dick.txt'
wget.download(url, 'moby_dick.txt')


documents = [line.strip() for line in open('moby_dick.txt', encoding='utf8').readlines()]
print(documents[:])

import spacy

nlp = spacy.load('en')

tokens = [[token.text for token in nlp(sentence)] for sentence in documents[:200]]
tokens

# from collections import Counter 

# Counter = Counter(tokens) 
# most_occur = Counter.most_common(10) 
# print(most_occur)

gelonida · Accepted Answer

The code

tokens = [[token.text for token in nlp(sentence)] for sentence in documents[:200]]

creates a list of lists of tokens.

What you want is a list if tokens.

try:

import itertools
tokens = itertools.chain.from_iterable(
    [[token.text for token in nlp(sentence)] for sentence in documents[:200]])

I want to collect counts over the tokens. and see what is the most frequent token?, my code that I written does not work , so I commented my code

Answers (1)

Related Questions