Daniel Mahler
Daniel Mahler

Reputation: 8203

access data in sharded JSON files on S3 from Blaze

I am trying to access line delimited JSON data on S3. From my understanding of the docs I should be able to do something like

print data(S3(Chunks(JSONLines))('s3://KEY:SECRET@bucket/dir/part-*.json').peek()

which throws

BotoClientError: BotoClientError: Bucket names cannot contain upper-case characters when using either the sub-domain or virtual hosting calling format.

I have tried variations on this leading to different errors.

I can get the following to work with local files:

print data(chunks(JSONLines)(map(JSONLines, glob("/home/me/data/*")))).peek()

I am not really sure why the (map(JSONLines, glob( is needed, though.

I do not really understand how to work with type-modofiers

Upvotes: 4

Views: 279

Answers (1)

John Moutafis
John Moutafis

Reputation: 23144

In the comments section of the Example 6 from this page http://www.programcreek.com/python/example/51587/boto.exception.BotoClientError states that:

Bucket names must not contain uppercase characters. We check for this by appending a lowercase character and testing with islower(). Note this also covers cases like numeric bucket names with dashes.

the function in use for this evaluation is check_lowercase_bucketname(n) and by the example calls we get:

>>> check_lowercase_bucketname("Aaaa")
Traceback (most recent call last):
...
BotoClientError: S3Error: Bucket names cannot contain upper-case
characters when using either the sub-domain or virtual hosting calling
format.

>>> check_lowercase_bucketname("1234-5678-9123")
True
>>> check_lowercase_bucketname("abcdefg1234")
True

The above mentioned, lead me to believe that your call to 's3://KEY:SECRET@bucket/dir/part-*.json' does not pass because either the KEY and/or the SECRET variables contain UpperCase or disallowed characters

Upvotes: 1

Related Questions