MattyKluch
MattyKluch

Reputation: 29

AWS Step Functions/Lambda ability to split multi-tabbed excel files into csvs?

For the record I have very little experience with AWS and it's services (I've worked primarily in Azure with my last few clients/employers) but I'm working on a large project where we have a large number of excel files we need to import into snowflake on a week to week basis. Our ingestion tool is Fivetran and we need a place to drop all these files (presumably either s3 or potentially sharepoint), run processing logic on them (making sure the users have entered values correctly) and then split the individual tabs of each of the excel docs into csvs. Can AWS Step Functions and/or Lambda accomplish this type of thing? Especially regarding splitting the each tab of the excel files into individual csvs? I ask about that because Fivetran only ingests csvs from AWS s3. Anyone have any ideas?

Upvotes: -1

Views: 227

Answers (1)

Justin Callison
Justin Callison

Reputation: 2219

Step Functions Distributed Map, with Lambda, would likely work well for you here.

You can have a Step Functions state machine that uses Distributed map to iterate over the files stored in S3, then send the keys to Lambda Functions to process each file. In your Lambda Functions, you'd have lots of options for processing the files. If it were me, I'd use Pandas, but there are lots of options.

Upvotes: 1

Related Questions