How to extract big docs from NLTK corpus

Question

I have downloaded the corpus Reuters from the NLTK library and want to store 10 random documents with more than 50 elements in a new variable.

I have already downloaded the corpus and have written the following code, but it runs contiously without stopping:

import nltk
nltk.download('reuters')
nltk.download('punkt')
from nltk.corpus import reuters

sample_data = []

for i in range(len(reuters.sents())):
  sent = random.choice(reuters.sents())
  if len(sent) <= 50:     # Skips the sentence if it contains less than 50 elements
    pass
  else:
    sample_data.append(sent)
  while len(sample_data) == 10:
    break

Is there a more efficient way of writing this so that the program completes my commands?

How to extract big docs from NLTK corpus

Answers (1)

Related Questions