Kar
Kar

Reputation: 1016

How to combine same files in mutliple folders into one file s3

If I have a file in multiple folders in S3, how do I combine them together using boto3 python

Say in a bucket I have

bucket_a
   ts
     ts_folder
          a_date.csv
          b_date.csv
          c_date.csv
          d_date.csv

     ts_folder2
          a_date.csv
          b_date.csv
          c_date.csv
          d_date.csv

I need to combine these two files into one file, also ignoring header in second file

I am trying to figure out how to achieve using boto3 python or aws

Upvotes: 1

Views: 5354

Answers (1)

JQadrad
JQadrad

Reputation: 541

Try something like this. I assume you have your AWS credentials set up properly on your system. My suggestion would be to first add the lines of the CSV to a new variable. For the second CSV you will skip the first line. After finding all the lines you join them as a string so they can be written to an S3 object.

import boto3
# Output will contain the CSV lines
output = []
with open("first.csv", "r") as fh:
    output.extend(fh.readlines())
with open("second.csv", "r") as fh:
    # Skip header
    output.extend(fh.readlines()[1:])

# Combine the lines as string
body = "".join(output)
# Create the S3 client (assuming credentials are setup)
s3_client = boto3.client("s3")
# Write the object
s3_client.put_object(Bucket="my-bucket",
                     Key="combined.csv",
                     Body=body)

Update This should help you with the S3 setup

import boto3
session = boto3.session.Session(profile_name='dev')
s3_client = session.client("s3")

bucket = "my-bucket"

files = []
for item in s3_client.list_objects_v2(Bucket=bucket, Prefix="ts/")['Contents']:
    if item['Key'].endswith(".csv"):
        files.append(item['Key'])

output = []        
for file in files:
    body = s3_client.get_object(Bucket=bucket,
                                Key=file)["Body"].read()
    output.append(body)

# Combine the lines as string
outputbody = "".join(output)
# Write the object
s3_client.put_object(Bucket=bucket,
                     Key="combined.csv",
                     Body=outputbody)

Upvotes: 4

Related Questions