Elm662
Elm662

Reputation: 673

convert python generator to database

I want to use readline_google_store (which is a generator) to create a database of its records. My code is like:

import sqlite3
import re
from google_ngram_downloader import readline_google_store
import time

def Main():
    try:
        start_time = time.time()
        p = re.compile(r'^[a-z]*$', re.IGNORECASE)
        el = 'abcdefghijklmnopqrstuvwxyz'
        # Open database connection
        con = sqlite3.connect('test.db')
        # create a class object
        cur = con.cursor()
        for l in el:
            fname, url, records = next(readline_google_store(ngram_len=1, indices=l))
            for r in records:
                #time.sleep(0.0001)
                if r.year >= 2000:
                    w = r.ngram.lower()
                    if p.match(w):
                        cur.execute('SELECT ngram, match_counts FROM Unigram WHERE ngram = ?', (w,))
                        results = cur.fetchone()
                        # print results
                        if not results: # or if results == None
                            cur.execute("INSERT INTO Unigram VALUES(?, ?);", (w, r.match_count))
                            con.commit()
                        else:
                            match_count_sum = results[1] + r.match_count
                            cur.execute("UPDATE Unigram SET match_counts = ? WHERE ngram = ?;", (match_count_sum, w))
                            con.commit()
    except sqlite3.Error, e:
        if con:
            con.rollback()
            print 'There was a problem with sql'
    finally:
        if con:
            con.close()
    end_time = time.time()
    print("--- It took %s seconds ---" % (end_time - start_time))

if __name__ == '__main__':
    Main() 

the input is (a record) in this format:

(ngram, year, match_count, page_count)

Disregarding the year and page_count I want to have a table with records like: (ngram, match_count_sum) where match_count_sum is the sum of all of the match_count at various years.

The error that pops up is:

requests.exceptions.ChunkedEncodingError: ("Connection broken: error(54, 'Connection reset by peer')", error(54, 'Connection reset by peer'))

I tried time.sleep(0.0001) to adjust thread scheduling and allow the socket I/O to finish but I get time-out error...

How can I fix this issue?

Upvotes: 0

Views: 291

Answers (1)

Chris Travers
Chris Travers

Reputation: 26464

Since SQLite seems to be reading/writing locally, your error seems to be an issue with the remote API. Usually this will be the slow part of your application but I would expect reading from that to be blocking.

Connection reset by peer usually indicates a network error somewhere. So the question is where the reset is coming from (could be a firewall, an API limitation, or the like. No idea where it is coming from based on the info but I can give you an initial checklist.

  1. Is it always hitting the same record? Maybe there is a server-side limitation on the other end?
  2. Is it happening randomly? Is there something going on with the network and you just need to handle the failure gracefully?

This is outside the clear control of your code but you may be able to handle the failure more gracefully.

Upvotes: 1

Related Questions