Reputation: 169
When I upload a .csv file using boto3 (Python), the last few rows of data get cut off. The size of the file is 268KB which should not to be too big for non-multipart upload. Here is my code:
import boto3
s3 = boto3.client('s3')
s3 = boto3.resource('s3')
s3.meta.client.upload_file(report_file.name, 'raw-data-bucket', 'Reports/report.csv')
*These are not exact bucket and path I've used but they suould be irrelevant in this case
Any help would be appriciated.
Upvotes: 3
Views: 2862
Reputation: 21
I had this issue because I was performing the upload before closing my filehandle. As someone else suggested closing the file first and then uploading fixed the issue.
This caused the uploaded file to be missing the last chunk.
import boto3
client = boto3.client("s3")
data = {...}
with open("file.json", "w") as f:
json.dump(data, f)
client.upload_file("file.json", "my-bucket", "some/prefix/file.json")
This resolved my issue:
import boto3
client = boto3.client("s3")
data = {...}
with open("file.json", "w") as f:
json.dump(data, f)
client.upload_file("file.json", "my-bucket", "some/prefix/file.json")
Upvotes: 2
Reputation: 88
Have you closed the file you are uploading to S3 before using .upload_file() ? I had the exact same problem with my *.CSV file upoads, and solved it by explicitly closing each file before uploading it, and the problem was solved. No more truncated *.CSV files.
Upvotes: 4
Reputation: 13166
Stick with one service(either resource or client).
# Using boto3 service client
import boto3
s3 = boto3.client('s3')
s3.upload_file('your_local_file_path", 'bucket_name', 'prefix_filename_to_s3')
For service resource
import boto3
s3 = boto3.resource('s3')
s3.Object('bucket_name', 'you_local_file_path').upload_file('prefix_filename_to_s3')
Check again the content of your "report_file.name". S3.upload_file works as GIGO(garbage in garbage out), it doesn't truncate data.
(update) After a further check, there is another case that I am not sure whether they are related. There is suggest that the httppretty module boto3 use is not thread safe, you should update your boto3 version and disable the thread.
from boto3.s3.transfer import TransferConfig
config = TransferConfig(use_threads=False)
client.download_file(Bucket="mybucket",
Key="foo/bar.fastq.gz", Filename="bar.fastq.gz", Config=config)
Upvotes: 0
Reputation: 111
Looks like this person had the same issue
256kb stackoverflow similar problem
Also, they provide the multi uplaod part in boto3 here
Upvotes: 0