databasefoe
databasefoe

Reputation: 49

Creating a script to replace values in files when being pushed into S3 bucket

I am using AWS S3 Bucket to upload CSV files regarding sensor data. Once uploaded into the bucket a python lambda function is triggered to provide the 'last update' time as an epoch and the 'direction' and 'speed' in a JSON format.

The initial upload via CSV

LAST UPDATE DIRECTION SPEED
4/30/2021 9:51:01 AM 146 4.22

The output from lambda function

1619776261 {'DIRECTION': '146', 'SPEED': '4.22'}

In order to demonstrate how this works, I was thinking of creating a script to replace the date and values in each of the uploaded files with random values and automatically pushing up to the s3 bucket every 30 secs or 1 minute. However, I am not sure how to go about writing this script so would appreciate any insights anyone has to offer! Thanks in advance

Upvotes: 0

Views: 729

Answers (1)

KnowledgeGainer
KnowledgeGainer

Reputation: 1097

enter image description here

This can be your logic, So the overview is,

Cloudwatch will start a trigger every 1 min, that trigger will be a lambda function which will upload the csv file to the S3, later since S3 is enabled with a trigger, it will trigger another lambda function which will perform the processing as you have mentioned in the question, and you can later save that data back or use it for any other purpose.

Before setting a cloudwatch, first we need our 1st lambda function which will do the uploading.

For random data creation and converting it to dataframe

import pandas as pd
import random
from datetime import datetime
df = pd.DataFrame(data={'LAST UPDATE':str(datetime.now()),'DIRECTION':str(random.randint(100,200)),'SPEED':str(random.random()+random.randint(1,5))[:4]},index=[0])

Output:

enter image description here

and storing it in s3, you can use this function inside your 1st lambda function,

You might have to do some modification in this code

import pandas as pd
import random
from io import StringIO
from datetime import datetime
import uuid
import boto3
s3_res = boto3.resource('s3')
df = pd.DataFrame(data={'LAST UPDATE':str(datetime.now()),'DIRECTION':str(random.randint(100,200)),'SPEED':str(random.random()+random.randint(1,5))[:4]},index=[0])
csv_df = StringIO()
df.to_csv(csv_df)
s3_res = boto3.resource('s3')
s3_res.Object(bucket, '{0}.csv'.format(str(uuid.uuid4()))).put(Body=csv_df.getvalue())

Now to invoke this function every 1 min,

You can follow this guide--> https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html

This will invoke your lambda function every 1 min.

Then your trigger will do the work as you have mentioned in the code.

Upvotes: 1

Related Questions