Shelby S
Shelby S

Reputation: 420

Rails ActiveStorage attachment to existing S3 file

I'm building a PDF parser that fires off a Sidekiq worker to OCR parse data from a document stored in S3. After parsing, the data is stored in the Document model.

How do I append the existing S3 bucket file to Document.attachment.attach in ActiveStorage without duplicating the file (via File.open, etc...) in S3?

Upvotes: 14

Views: 5321

Answers (2)

Troy
Troy

Reputation: 5399

This can be done with a slight manipulation of the blob after it is created.

storage.yml

amazon:
  service: S3
  access_key_id: <%= ENV['AWS_ACCESS_KEY_ID'] %>
  secret_access_key: <%= ENV['AWS_SECRET_ACCESS_KEY'] %>
  region: <%= ENV['AWS_REGION'] %>
  bucket: <%= ENV['S3_BUCKET'] %>

app/models/document.rb

class Document < ApplicationRecord
  has_one_attached :pdf
end

rails console

key = "<S3 Key of the existing file in the same bucket that storage.yml uses>"

# Create an active storage blob that will represent the file on S3
params = { 
  filename: "myfile.jpg", 
  content_type:"image/jpeg", 
  byte_size:1234, 
  checksum:"<Base 64 encoding of the MD5 hash of the file's contents>" 
}
blob = ActiveStorage::Blob.create_before_direct_upload!(params)

# By default, the blob's key (S3 key, in this case) a secure (random) token
# However, since the file is already on S3, we need to change the 
# key to match our file on S3
blob.update(key: key)

# Now we can create a document object connected to your S3 file
d = Document.create! pdf:blob.signed_id

# in your view, you can now use
url_for d.pdf

At this point, you can use the pdf attribute of your Document object like any other active storage attachment.

Upvotes: 18

michaelmedford
michaelmedford

Reputation: 176

Troy's answer worked great for me! I also found it helpful to pull the metadata about the object from the s3 instance of the object. Something like:

s3 = Aws::S3::Resource.new(region: "us-west-1")
obj = s3.bucket("my-bucket").object("myfile.jpg")    

params = {
    filename: obj.key, 
    content_type: obj.content_type, 
    byte_size: obj.size, 
    checksum: obj.etag.gsub('"',"")
}

I only have 46 points so I left this as an answer instead of a comment :/

Upvotes: 14

Related Questions