Tewfik Ghariani
Tewfik Ghariani

Reputation: 311

java.sql.SQLExceptionPyRaisable on the second attempt connecting to Athena using Django

I am using the python module called PyAthenaJDBC in order to query Athena using the provided JDBC driver. Here is the link : https://pypi.python.org/pypi/PyAthenaJDBC/

I have been facing some persistent issue. I keep getting this java error whenever I use the Athena connection twice in a row.

As a matter of fact, I was able to connect to Athena, show databases, create new tables and even query the content. I am building an application using Django and running its server to use Athena However, I am obliged to re-run the server in order for the Athena connection to work once again,

Here is a glimpse of the class I have built

 import os
import configparser
import pyathenajdbc


#Get aws credentials for the moment
aws_config_file = '~/.aws/config'

Config = configparser.ConfigParser()
Config.read(os.path.expanduser(aws_config_file))

access_key_id = Config['default']['aws_access_key_id']
secret_key_id = Config['default']['aws_secret_access_key']


BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
athena_jdbc_driver_path = BASE_DIR + "/lib/static/AthenaJDBC.jar"
log_path = BASE_DIR + "/lib/static/queries.log"

class PyAthenaLoader():
    def __init__(self):
        pyathenajdbc.ATHENA_JAR = athena_jdbc_driver_path

    def connecti(self):
        self.conn = pyathenajdbc.connect(
                              s3_staging_dir="s3://aws-athena-query-results--us-west-2",
                              access_key=access_key_id,
                              secret_key=secret_key_id,
                              #profile_name = "default",
                              #credential_file = aws_config_file,
                              region_name="us-west-2",
                              log_path=log_path,
                              driver_path=athena_jdbc_driver_path
                              )


    def databases(self):
        dbs = self.query("show databases;")
        return dbs

    def tables(self, database):
        tables = self.query("show tables in {0};".format(database))
        return tables

    def create(self):
        self.connecti()
        try:
            with self.conn.cursor() as cursor:
                cursor.execute(
                    """CREATE EXTERNAL TABLE IF NOT EXISTS sales4 (
                        Day_ID date,
                        Product_Id string,
                        Store_Id string, 
                        Sales_Units int,
                        Sales_Cost float, 
                        Currency string
                    ) 
                    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
                    WITH SERDEPROPERTIES (
                        'serialization.format' = '|',
                        'field.delim' = '|',
                        'collection.delimm' = 'undefined',
                        'mapkey.delim' = 'undefined'
                    ) LOCATION 's3://athena-internship/';
                """)

                res = cursor.description
        finally:
            self.conn.close()
        return res


    def query(self, req):
        self.connecti()     
        try:
            with self.conn.cursor() as cursor:
                cursor.execute(req)
                print(cursor.description)
                res = cursor.fetchall()
        finally:
                self.conn.close()
        return res


    def info(self):
        res = []
        for i in dir(pyathenajdbc):

            temp = i + ' = ' + str(dic[i])
            #print(temp)
            res.append(temp)    

        return res

Example of usage :

def test(request):
        athena = jdbc.PyAthenaLoader()
        res = athena.query('Select * from sales;')
        return render(request, 'test.html', {'data': res})

Works just fine! However refreshing the page would cause this error :

Error

Note that I am using a local .jar file: I thought that would solve the issue but I was wrong Even if I remove the path of the JDBC driver and let the module download it from s3, the error persists:

File "/home/tewfikghariani/.virtualenvs/venv/lib/python3.4/site-packages/pyathenajdbc/connection.py", line 69, in init ATHENA_CONNECTION_STRING.format(region=self.region_name, schema=schema_name), props) jpype._jexception.java.sql.SQLExceptionPyRaisable: java.sql.SQLException: No suitable driver found for jdbc:awsathena://athena.us-west-2.amazonaws.com:443/hive/default/

Furthermore, when I run the module on its own, it works just fine. When I set multiple connection inside my view before rendering the template, that works just fine as well.

I guess the issue is related to the django view, once one of the views is performing a connection with athena, the next connection is not possible anymore and the error is raised unless I restart the server

Any help? If other details are missing I will provide them immediately.

Upvotes: 2

Views: 1007

Answers (1)

Tewfik Ghariani
Tewfik Ghariani

Reputation: 311

Update: After posting the issue in github, the author solved this problem and released a new version that works perfectly. It was a multi-threading problem with JPype.

Question answered!

ref : https://github.com/laughingman7743/PyAthenaJDBC/pull/8

Upvotes: 2

Related Questions