ProEvilz
ProEvilz

Reputation: 5455

How can I re-upload a CSV after processing it with fast-csv to S3 with streams?

I'm attempting to download csv files from S3, perform some transforms on the data (in this example, hardcoding an ID), and then reupload it back to S3 as a 'processed' version of the file whilst using streams to avoid running out of memory. Fast-csv looked to be a good library to do this with.

Consider the following code:

const s3Client = new S3Client({ region: 'eu-west-2' });
const getFileFromS3 = async () => {
  const command = new GetObjectCommand({
    Bucket: 'mybucket',
    Key: 'originaldata.csv',
  });
  const getFile = await s3Client.send(command);
  const stream = await getFile.Body;
  return stream;
};

const csvParser = csv
  .parse({ headers: true })
  .transform((data) => ({
    ...data,
    id: 'TEST',
  }))
  .on('error', (error) => console.error(error))
  .on('data', (row) => console.log(row))
  .on('end', (rowCount: number) => console.log(`Parsed ${rowCount} rows`));

  const fileStream = await getFileFromS3();
  const transformationStream = new PassThrough();
  
  fileStream.pipe(csvParser).pipe(transformationStream);

  const upload = new Upload({
    client: s3Client,
    params: {
      Bucket: 'mybucket',
      Key: 'processeddata.csv',
      Body: transformationStream,
    },
  });
  await upload.done();

But when doing this, I get the following error:

TypeError [ERR_INVALID_ARG_TYPE]: The "chunk" argument must be of type string or an instance of Buffer or Uint8Array. Received an instance of Object

It would seem another person has encountered this in the fast-csv repo, but the solution was never given.

Upvotes: 1

Views: 505

Answers (0)

Related Questions