Reputation: 2638
I have the following python script that downloads two files from an S3 compatible service. Then merges them and uploads the output to another bucket.
import time
import boto3
import pandas as pd
timestamp = int(time.time())
conn = boto3.client('s3')
conn.download_file('segment', 'segment.csv', 'segment.csv')
conn.download_file('payment', 'payments.csv', 'payments.csv')
paymentsfile = 'payments.csv'
segmentsfile = 'segment.csv'
outputfile = 'payments_merged_' + str(timestamp) + '.csv'
csv_payments = pd.read_csv(paymentsfile, dtype={'ID': float})
csv_segments = pd.read_csv(segmentsfile, dtype={'ID': float})
csv_payments = csv_payments.merge(csv_segments, on='ID')
open(outputfile, 'a').close()
csv_payments.to_csv(outputfile)
conn.upload_file(outputfile, backup, outputfile)
However if I execute the script it stores the files in the folder of my script. For security reasons I would like to prevent this to happen. I could delete the files after the script was executed but let's assume my script is located in the folder /app/script/
. This means for a short time, while the script is being executed, someone could open the url example.com/app/script/payments.csv
and download the file. What is a good solution for that?
Upvotes: 0
Views: 1047
Reputation: 13166
In fact, pandas.read_csv let you read a buffer or byte object. You can do everything in the memory. Either put this script in a instance, even better, you can run it as AWS lambda process if the file is small.
import time
import boto3
import pandas as pd
paymentsfile = 'payments.csv'
segmentsfile = 'segment.csv'
outputfile = 'payments_merged_' + str(timestamp) + '.csv'
s3 = boto3.client('s3')
payment_obj = s3.get_object(Bucket='payment', Key=paymentsfile )
segment_obj = s3.get_object(Bucket='segment', Key=segmentsfile )
csv_payments = pd.read_csv(payment_obj['Body'], dtype={'ID': float})
csv_segments = pd.read_csv(segments_obj['Body'], dtype={'ID': float})
csv_merge = csv_payments.merge(csv_segments, on='ID')
csv_merge.to_csv(buffer)
buffer.seek(0)
s3.upload_fileobj(buffer, 'bucket_name', outputfile )
Upvotes: 1
Reputation: 2798
The simplest way would be to modify the configuration of your web server to not serve the directory that you are writing to or write to a directory that isn't served. For example, a common practice is to use /scr for this type of thing. You would need to modify permissions for the user your web server runs under to ensure it has access to /scr.
To restrict web server access to the directory you write to you can use the following in Nginx -
https://serverfault.com/questions/137907/how-to-restrict-access-to-directory-and-subdirs
For Apache you can use this example -
Upvotes: 1