user9809879
user9809879

Reputation:

How to store aws athena output from python script in excel?

I am querying from aws athena using python script and pyathena library and I'm getting the correct output in the form of table.

Output

Now the problem is I want to store the output in excel.

Can anyone suggest me, using python script how i can store output in Excel?

Here is the code that I am using for querying in aws athena:

from pyathena import connect
import os
import pandas as pd
%matplotlib inline

conn = connect(aws_access_key_id='*****',
                 aws_secret_access_key='*****',
                 s3_staging_dir='s3://****/',
                 region_name='us-west-1')

cursor = conn.cursor()
%time cursor.execute("SELECT * from my_table;")

Thanks in advance...

Upvotes: 1

Views: 1606

Answers (3)

davmoser
davmoser

Reputation: 31

Depending on how precise you need to insert the data into the sheet, you might checkout OpenPyXl - https://openpyxl.readthedocs.io

I use that when I need to insert the Athena results into specific cells and/or worksheets in the workbook. For when I need to be more precise than displaying a table of results. You can reference individual cells like worksheet['A53'] = 12345

Upvotes: 0

Vishal Upadhyay
Vishal Upadhyay

Reputation: 811

You can use pandas to query and save data in excel:

data = pd.read_sql("SELECT * from my_table;",conn) 
data.to_excel('data.xlsx') 

Upvotes: 1

Guy
Guy

Reputation: 12901

The output to Excel is not limited to creating the xlsx file, which you can also write as csv and let Excel load the csv file.

You can create multiple sheets using something like:

from pandas import ExcelWriter
def save_xls(list_dfs, dfs_names, xls_path):
    with ExcelWriter(xls_path) as writer:
        for df,name in zip(list_dfs, dfs_names):
            df.to_excel(writer,name)
        writer.save()

and then you can call the function with some transformation of your data, such as pivot tables and even colors:

save_xls(
    (raw.style.format("{:,.0f}"), 
     actual_table.style.format("{:,.0f}"), 
     diff_table.style.applymap(_color_red_or_green).format("{:,.0f}"), 
     ratio_table.style.applymap(_color_red_yellow_or_green).format("{:.3f}")),
    ('Raw',
    'Actuals',
    'Diff',
    'Ratio'),
    results_with_format.xlsx')

for example, formatting with three colors based on the value of the cells:

def _color_red_yellow_or_green(val):
    color = 'red' if val > 0.1 else 'yellow' if val > 0.05 else 'green'
    return 'background-color: %s' % color

Upvotes: 0

Related Questions