Power BI Athena Incremental Refresh

I have been successfully using Power BI’s incremental refresh daily with a MySQL data source. However, I can't get this configured with AWS Athena, because seemingly the latter interprets the values in the required parameters RangeStart and RangeEnd as strings. Since the data source is around 50 million rows I’d rather avoid querying this from scratch every day.

In this video from Guy in a Cube, you can clearly see that the query sent by Power BI to Azure has a convert to datetime2 function - something like this is presumably missing for Athena/Presto, which needs the type constructor TIMESTAMP in order to do datetime comparisons (https://stackoverflow.com/a/38041684/3675679), and of course incremental refresh must be based on datetime fields. I am using the datetime field adv_date for the incremental load.

Here is what the M query looks like in Power Query Editor:

= Table.SelectRows(#"Removed Columns1", each [adv_date] >= RangeStart and [adv_date] < RangeEnd)

And here is the resultant error message in Athena:

Your query has the following errors:SYNTAX_ERROR: line 1:1: Incorrect number of parameters: expected 2 but found 0

Whilst this is how Athena interprets the query:

    select "col1", "col2", "adv_date" 
    from "AwsDataCatalog"."test"."test_table" 
    where "adv_date" >= ? and "adv_date" < ?

I have contacted Power BI support without success. Does anyone have a workaround for this by any chance? Happy to provide more info if needed.

Upvotes: 1

Answers (4)

BillaD

Reputation: 98

I think you are trying to fix Filtered Rows step, but might be able to achieve incremental load by fixing Step 1 - Source (running actual direct query to Athena)

Pasting my answer on this from another question thread:

I think I have managed to achieve the "Incremental Load" in Power BI using Athena. This (still) does not allow you to view Native query but you can still make Power BI manipulate the direct query to implement it.

To avoid full scan of S3 data in Athena - you have to enable Partitions in your dataset. Without going off topic, once you partition the S3 data via Athena you can then pin point the datasets with days/months/years without scanning your whole dataset.

Once you do that, you can achieve the Incremental Load by running Direct Queries as mentioned in this video (20:00 onwards) and achieve resource-efficient query execution.

The final query will look something like -

Odbc.Query("dsn=Simba Athena", 
    "SELECT * FROM tablename 
    WHERE year >= " & DateTime.ToText(RangeStart, "yyyy") & "
AND month >= " & DateTime.ToText(RangeStart, "MM") & "
AND day >= " & DateTime.ToText(RangeStart, "dd") & "
AND year <= " & DateTime.ToText(RangeEnd, "yyyy") & "
AND month <= " & DateTime.ToText(RangeEnd, "MM") & "
AND day <= " & DateTime.ToText(RangeEnd, "dd") & "
")

EDIT #1: OR simply

    Odbc.Query("dsn=Simba Athena", 
        "SELECT * FROM tablename 
        WHERE dt >= '" & DateTime.ToText(RangeStart, "yyyy/MM/dd") & "'
        AND dt <= '" & DateTime.ToText(RangeEnd, "yyyy/MM/dd") & "'
    ")

Upvotes: 1

Rafael Ruivo

Reputation: 1

Direct query also works for me, but I eventually just moved the filters to a view inside Athena - PBI can't be trusted to handle stuff like this, sadly.

Anyway, there is a (sort of) workaround for M queries, in case someone else need it: I found out that if you add certain steps before the filter, Power BI will not try any query folding, therefore not messing up the SQL it sends to Athena. In my case, I added a duplicated column and renamed it. PBI will, of course, still load all the data, because of course it will, but it will dump it once que query finishes fetching data. This way at least we can save space on the file, even if loading time stays the same.

Sorry if I sound frustrated in this answer - the reason is that I am incredibly frustrated with Power BI.

Upvotes: 0

ᐅdevrimbaris

Reputation: 796

A guy from Microsoft directed me to use the Odbc.Query rather than to use Odbc.Datasource. Here is an example from the URL he sent:

let
Source = Odbc.Query("dsn=Google BigQuery", "SELECT line_of_business, category_group FROM masterdata.item_d WHERE line_of_business in ('" & LOB & "')")
in
Source

I have tried this and it worked, maybe you can use this also.

Upvotes: 1

Ricky McMaster

Reputation: 4647

So I have an answer of sorts - I don't believe that it is currently possible to set up Athena as an incremental source in Power BI, using a standard connection.

However, it is possible to do this via a dataflow, with the caveat that for our environment it was not particularly fast. However it does work.

Upvotes: 0

Power BI Athena Incremental Refresh

Answers (4)

Related Questions