Reputation: 724
I am writing a lambda function in aws thats calling on data from redshift. Purpose of this function is to run each day and sends out notifications (emails) of the output from this function (in this case - I want it to be a table).
Here is my current function. I am able to see the list of rows from the query output but now I want to save that in a table format or least print out the full table/output. Very new to AWS so I was wondering how do I store it as a new table in redshift/or anywhere else in AWS so I can send it to ppl?
Code:
import json
import psycopg2
import boto3
credential = {
'dbname' : 'main',
'host_url' : 'dd.us-west-1.redshift.amazonaws.com',
'port' : '5439',
'user' : 'private',
'password' : '12345678'
}
redshift_role = {
'dev': 'arn:aws:lambda:us-west-1:15131234566:function:test_function'
}
def lambda_handler(event, context):
# TODO implement
#client = boto3.client('redshift-data')
conn_string = "dbname='{}' port='{}' user='{}' password='{}' host='{}'"\
.format(credential['dbname'], credential['port'], credential['user'], credential['password'], credential['host_url'])
con = psycopg2.connect(conn_string)
cur = con.cursor()
sql_query = """with
tbl as (
select
case
when (sa.parentid like '001i0000023STBY%' or sa.ultimate_parent_account__c like '001i0000023STBY%') --Parent OR Ultimate Parent is <Department of Defense>
then sa.id
else
coalesce(sa.ultimate_parent_account__c, sa.parentid, sa.id) end as cust_id,
(select name from salesforce.account where id=cust_id) as cust_name,
sa.name as acct_name,
sa.id as acct_id,
sa.parentid,
(select name from salesforce.account where id=sa.parentid) as par_name,
(select name from salesforce.account where id=sa.ultimate_parent_account__c) as ult_par_name,
so.id as opp_id,
so.name as opp_name,
so.stagename as stg_name,
so.type as opp_type,
so.Manager_Commit__c as mgr_commit,
so.renewal_risk__c as opp_risk,
so.isclosed as cls,
so.finance_date__c as fin_date,
DATEPART(QUARTER,so.finance_date__c) as Q,
DATEPART(QUARTER,so.closedate) as Q_cls,
DATEPART(QUARTER,so.subscription_start_date__c) as Q_ren_due,
so.Total_NARR__c as arr,
so.NARR__c as fin_nacv,
so.churn__c as fin_churn,
so.Renewal_Amount__c as ren_amt,
so.Available_to_Renew_ARR__c as avl_ren_arr,
so.estimated_narr__c as nacv,
so.bi_detect_nacv__c as bi_detect,
so.bi_recall_nacv__c as bi_recall,
so.bi_stream_nacv__c as bi_stream,
so.bi_dfaws_nacv__c as bi_dfaws,
so.bi_o365_nacv__c as bi_o365,
so.bi_services_nacv__c as bi_svcs,
sp.name as pr_name,
sp.family as pr_family,
sp.sbqq__subscriptiontype__c as pr_type,
sol.product_code__c as oli_code,
sol.sbqq__quoteline__c as qli_id,
sol.quantity as qty,
sca.serial__c as ca_name,
(select name from salesforce.product2 where id = sca.product__c ) as ca_pr_name,
sca.mode_updater__c as ca_mode,
sca.updater_last_seen__c as ca_last_seen,
sca.software_version__c as ca_sw_version,
sca.total_hosts__c as ca_tot_hosts,
sca.active_hosts__c as ca_active_hosts,
sca.X95_Host_Total__c as ca_x95_hosts_tot,
sca.traffic__c as ca_traffic,
sca.uiconfig__c as ca_uiconfig
from
salesforce.opportunity so
join
salesforce.account sa on
so.accountid = sa.id
join salesforce.user su on
so.ownerid = su.id
join salesforce.opportunitylineitem sol on
so.id = sol.opportunityid
join salesforce.product2 sp on
sol.product2id = sp.id
join salesforce.customasset__c sca on
so.id = sca.opportunity__c
where
so.isdeleted = false
and sa.isdeleted = false
and sol.isdeleted = false
order by
Q
)
select * from
(select
tbl.acct_name as acct,
tbl.ca_name,
tbl.ca_pr_name,
tbl.ca_mode,
date(tbl.ca_last_seen) as ca_last_seen,
tbl.ca_sw_version,
tbl.ca_tot_hosts,
tbl.ca_active_hosts,
tbl.ca_x95_hosts_tot,
tbl.ca_traffic,
tbl.ca_uiconfig
from
tbl
where
tbl.stg_name like 'Closed Won%'
and tbl.arr is not null
group by
tbl.acct_name,
tbl.opp_id,
tbl.ca_name,
tbl.ca_pr_name,
tbl.ca_mode,
tbl.ca_last_seen,
tbl.ca_sw_version,
tbl.ca_tot_hosts,
tbl.ca_active_hosts,
tbl.ca_x95_hosts_tot,
tbl.ca_traffic,
tbl.ca_uiconfig) df
WHERE ca_last_seen >= DATEADD(MONTH, -3, GETDATE())"""
cur.execute(sql_query)
with con.cursor() as cur:
rows = []
cur.execute(sql_query)
for row in cur:
rows.append(row)
print(rows)
con.close()
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
I included a write portion to tmp directory but am having trouble writing to my s3 bucket as it leads to a timeout error.
Updated portion of code below:
with con.cursor() as cur:
# Enter the query that you want to execute
cur.execute(sql_query)
for row in cur:
res = cur.fetchall()
print(res)
#Save the Query results to a CSV file
fp = open('/tmp/Processlist.csv', 'w')
myFile = csv.writer(fp)
myFile.writerows(res)
fp.close()
#s3.upload_file('/tmp/Processlist.csv', 'data-lake-020192', 'Processlist.csv')
#con.close()
Upvotes: 1
Views: 776
Reputation: 11042
There are several ways to do this. The first and least efficient is to insert data into a table using INSERT INTO VALUES (...). This provides the data as part of the SQL and therefore is processed by the query compiler, moved from the leader to the compute nodes, and then stored in the table. This process is inefficient, potentially stresses the leader node, and is generally frowned upon. However, if you are only loading a small number of rows and it run infrequently (like when the database is generally lightly loaded) then this can work fine. Just remember there is a limit to how long a SQL statement can be but if you are anywhere close to this you are likely loading too much data via this path. 100 rows of 5 columns should be ok.
The best way, but takes more coding, is to write the data out to an S3 file (or files if the data is large) and then COPY it into the desired table. A CSV file is simple to generate and human readable. This process also gives you a record of the table contents per day for any future need (debug, audit, etc.).
Alternatively you could just save the data to S3 and then use Redshift Spectrum to access the data from S3. This may be a good choice for very large amounts of data and/or data that is rarely used. In most cases I would expect that having the data native in Redshift (COPY from S3) is the way to go.
Coding any of these in lambda is straight forward - just make the calls to the services and issue the SQL as needed.
Upvotes: 1