Reputation: 673
I want to use readline_google_store
(which is a generator) to create a database of its records. My code is like:
import sqlite3
import re
from google_ngram_downloader import readline_google_store
import time
def Main():
try:
start_time = time.time()
p = re.compile(r'^[a-z]*$', re.IGNORECASE)
el = 'abcdefghijklmnopqrstuvwxyz'
# Open database connection
con = sqlite3.connect('test.db')
# create a class object
cur = con.cursor()
for l in el:
fname, url, records = next(readline_google_store(ngram_len=1, indices=l))
for r in records:
#time.sleep(0.0001)
if r.year >= 2000:
w = r.ngram.lower()
if p.match(w):
cur.execute('SELECT ngram, match_counts FROM Unigram WHERE ngram = ?', (w,))
results = cur.fetchone()
# print results
if not results: # or if results == None
cur.execute("INSERT INTO Unigram VALUES(?, ?);", (w, r.match_count))
con.commit()
else:
match_count_sum = results[1] + r.match_count
cur.execute("UPDATE Unigram SET match_counts = ? WHERE ngram = ?;", (match_count_sum, w))
con.commit()
except sqlite3.Error, e:
if con:
con.rollback()
print 'There was a problem with sql'
finally:
if con:
con.close()
end_time = time.time()
print("--- It took %s seconds ---" % (end_time - start_time))
if __name__ == '__main__':
Main()
the input is (a record) in this format:
(ngram, year, match_count, page_count)
Disregarding the year and page_count I want to have a table with records like: (ngram, match_count_sum)
where match_count_sum
is the sum of all of the match_count
at various years.
The error that pops up is:
requests.exceptions.ChunkedEncodingError: ("Connection broken: error(54, 'Connection reset by peer')", error(54, 'Connection reset by peer'))
I tried time.sleep(0.0001)
to adjust thread scheduling and allow the socket I/O to finish but I get time-out error...
How can I fix this issue?
Upvotes: 0
Views: 291
Reputation: 26464
Since SQLite seems to be reading/writing locally, your error seems to be an issue with the remote API. Usually this will be the slow part of your application but I would expect reading from that to be blocking.
Connection reset by peer usually indicates a network error somewhere. So the question is where the reset is coming from (could be a firewall, an API limitation, or the like. No idea where it is coming from based on the info but I can give you an initial checklist.
This is outside the clear control of your code but you may be able to handle the failure more gracefully.
Upvotes: 1