catch22
catch22

Reputation: 1693

How to strip EXIF Metadata from an Image before uploading to S3 without loading the entire file into memory

I have a service that uploads images to AWS S3 using a MultipartFile. These images are later served as public files. There is a security concern: these images might contain sensitive EXIF metadata (e.g., geolocation data) that has to be removed out before making them public.

Problem: I need to strip the EXIF metadata from these images without loading the entire file into memory, as some of the images could be quite large.

My current approach:

private S3Service.S3UploadedFile uploadImage(MultipartFile file) {
    try {
        ByteArrayOutputStream originalOut = stripMetadata(file.getInputStream());

        final PipedInputStream in = new PipedInputStream();
        new Thread(() -> {
            try (final PipedOutputStream newOut = new PipedOutputStream(in)) {
                originalOut.writeTo(newOut);
            } catch (IOException e) {
                // logging and exception handling should go here
            }
        }).start();

        S3File processedS3File = S3File.builderOf(in, file.getContentType())
                .isPublic(true)
                .contentLength((long) originalOut.size())
                .build();

        return s3Service.upload(bucketName, processedS3File);

    } catch (IOException | ImageWriteException | ImageReadException e) {
        throw new RuntimeException("ERR");
    }
}

public static ByteArrayOutputStream stripMetadata(InputStream imageInputStream)
        throws IOException, ImageWriteException, ImageReadException {

    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
    ExifRewriter exifRewriter = new ExifRewriter();
    exifRewriter.removeExifMetadata(imageInputStream, outputStream);

    return outputStream;
}

The usage of Piped streams is based on this answer: https://stackoverflow.com/a/23874232/6157949

However, in the stripMetadata method, I'm using the Apache Commons Imaging library to remove EXIF metadata. The problem is that it requires an OutputStream, and I'm currently using a ByteArrayOutputStream, which loads the entire image into memory.

What I Need Help With:

I need guidance on how to tweak this approach so that I can strip the EXIF metadata from the image and upload it to S3 without loading the entire file into memory.

Any help or suggestions would be greatly appreciated!

Upvotes: 3

Views: 267

Answers (1)

cyberbrain
cyberbrain

Reputation: 5075

You should simply pass the PipedOutputStream to the call exifRewriter.removeExifMetadata instead of creating a buffer on the heap.

I rewrite your code (untested) to make clear, what I mean:

private S3Service.S3UploadedFile uploadImage(MultipartFile file) {
  try {
    ExifRewriter exifRewriter = new ExifRewriter();
    final PipedInputStream in = new PipedInputStream();

    new Thread(() -> {
      try (final PipedOutputStream newOut = new PipedOutputStream(in)) {
        exifRewriter.removeExifMetadata(file.getInputStream(), newOut);
      } catch (IOException e) {
        // logging and exception handling should go here
      }
    }).start();

    S3File processedS3File = S3File.builderOf(in, file.getContentType())
        .isPublic(true)
        .contentLength((long) originalOut.size())
        .build();

    return s3Service.upload(bucketName, processedS3File);
  } catch (IOException | ImageWriteException | ImageReadException e) {
      throw new RuntimeException("ERR");
  }
}

Upvotes: 0

Related Questions