manisha
manisha

Reputation: 607

PyAthenaJDBC query string formatting issue

I am trying to use the PyAthenaJDBC library to make a python script that will query an S3 bucket using Athena. The library is pretty great but I'm having issues with formatting.

I'm constructing the queries as strings in a separate function and passing the query string back to cursor.execute(query).

The query string has quotes in it, and looks like this:

SELECT day, elb_name
    ,COUNT (*) AS c
    ,100.0 * (
        1.0 - (
            SUM (
                CASE
                    WHEN elb_response_code LIKE '5%' THEN 1
                    ELSE 0
                END
            ) / cast(COUNT (*) as double)
        )
    ) AS success_rate
    ,100.0 * SUM (
        CASE
            WHEN backend_processing_time < 0.1 THEN 1
            ELSE 0
        END
    ) / cast(COUNT (*) as double) AS t_lt_pt1
    ,100.0 * SUM (
        CASE
            WHEN backend_processing_time < 1 THEN 1
            ELSE 0
        END
    ) / cast(COUNT (*) as double) AS t_lt_1
    ,100.0 * SUM (
        CASE
            WHEN backend_processing_time < 5 THEN 1
            ELSE 0
        END
    ) / cast(COUNT (*) as double) AS t_lt_5
    ,100.0 * SUM (
        CASE
            WHEN backend_processing_time < 10 THEN 1
            ELSE 0
        END
    ) / cast(COUNT (*) as double) AS t_lt_10
    FROM elb_logs_raw_native_part
    WHERE year = '2017' AND
    month = '03' AND
    elb_name is not NULL AND
    elb_name != ''
    GROUP BY day, elb_name
    ORDER BY c DESC

This will cause a error on the second single quotation mark in the `LIKE '5%' statement.

ValueError: unsupported format character ''' (0x27) at index 186

I can avoid the error and successfully execute the query by changing this line in the library https://github.com/laughingman7743/PyAthenaJDBC/blob/master/pyathenajdbc/formatter.py#L115

from return (operation % kwargs).strip() to return (operation).strip() At this point in the code, operation == query (pasted above) and kwargs == {}

My specific question is, am I structuring my query incorrectly? Or is this something I don't understand about the native string formatting where trying to do a replace with an empty dictionary is a bad idea?

Upvotes: 0

Views: 213

Answers (1)

cco
cco

Reputation: 6281

If the query is used as a string to be formatted, you need to double any % characters you want to survive as literal % characters.

Instead of WHEN elb_response_code LIKE '5%' THEN 1, use WHEN elb_response_code LIKE '5%%' THEN 1. After the line you indicate in formatter.py runs, %% will have been transformed into a single %.

Upvotes: 1

Related Questions