Reputation: 3829
I am using google_ngram_downloader to read the google datasets.
Code :
from google_ngram_downloader import readline_google_store
fname, url, records = next(readline_google_store(ngram_len=1))
for x in range(0,5):
print next(records)
Here I am reading the datasets one by one starting from 0,1,... a,b,c.. z. next(readline_google_store(ngram_len=1)) gives the ngrams one by one. I want to read directly the datasets which will 'a','b' anything not one by one.
Required: Read only dataset which starts from letter 'a' having 1-gram dataset.
Upvotes: 4
Views: 1995
Reputation: 25371
One way is to add the indices
explicitly. Use this line to get just the ngrams of length 1 that start with a.
fname, url, records = next(readline_google_store(ngram_len=1,indices='a'))
Upvotes: 1