user14994313
user14994313

Reputation:

How to extract big docs from NLTK corpus

I have downloaded the corpus Reuters from the NLTK library and want to store 10 random documents with more than 50 elements in a new variable.

I have already downloaded the corpus and have written the following code, but it runs contiously without stopping:

import nltk
nltk.download('reuters')
nltk.download('punkt')
from nltk.corpus import reuters

sample_data = []

for i in range(len(reuters.sents())):
  sent = random.choice(reuters.sents())
  if len(sent) <= 50:     # Skips the sentence if it contains less than 50 elements
    pass
  else:
    sample_data.append(sent)
  while len(sample_data) == 10:
    break

Is there a more efficient way of writing this so that the program completes my commands?

Upvotes: 0

Views: 175

Answers (1)

tejas e
tejas e

Reputation: 74

Try using if instead of while:

if len(sample_data) == 10:
    break

Upvotes: 1

Related Questions