Manoj Gogineni
Manoj Gogineni

Reputation: 61

Amazon Athena - S3 location error

I am getting below error while running an Amazon Athena query on an S3 bucket.

I am running this query on CloudFront access logs.

CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront.cf_logs (
  `date` date,
  `time` string,
  `location` string,
  `bytes` int,
  `requestip` string,
  `method` string,
  `host` string,
  `uri` string,
  `status` int,
  `referrer` string,
  `os` string,
  `browser` string,
  `browserversion` string 
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
) LOCATION 's3://cloudfront-access/test-sh/'
TBLPROPERTIES ('has_encrypted_data'='false');

Error returned:

Your query has the following error(s):

The S3 location provided to save your query results is invalid. Please
check your S3 location is correct and is in the same region and try
again. If you continue to see the issue, contact customer support for
further assistance. (Service: AmazonAthena; Status Code: 400; Error
Code: InvalidRequestException; Request ID:
f8cd2762-1e7-a2f9-e5eb1d865406)

Upvotes: 5

Views: 14191

Answers (2)

Zerodf
Zerodf

Reputation: 2298

When you run Data Definition Language (DDL), you actually produce output to S3, just like if you had run DML.

For example, see the following example as illustrated with the AWS CLI:

  1. Create a bucket for this example

    $ aws s3 mb s3://athena-covid/

  2. Get some data (Source: Covid Tracking Project)

    $ wget -o tx.csv https://api.covidtracking.com/v1/states/tx/daily.csv

  3. Upload those data to S3

    $ aws s3 cp daily.csv s3://athena-covid/src/daily.csv

  4. Now run some DDL

    $ aws athena start-query-execution --result-configuration OutputLocation=s3://athena-covid/out/ --query-string "$(cat ddl.sql)"

    which returns

    { "QueryExecutionId": "427fd5d0-02cf-49e6-82eb-0c25aae46e80" }

  5. Even though the query in ddl.sql returns no result set, it still generated an empty text file in the output location specified in the -result-configuration above.

    $ aws s3 ls s3://athena-covid/out/

    Which returns

    2021-03-06 12:52:25 0 427fd5d0-02cf-49e6-82eb-0c25aae46e80.txt

    Notice the 0 showing the size of the object in S3

  6. Of course, if we run normal DML, we will get an actual result set.

    $ aws athena start-query-execution --result-configuration OutputLocation=s3://athena-covid/out/ --query-string "SELECT data_date, state, positive, negative FROM default.tx_covid LIMIT 10"

    Returning:

    { "QueryExecutionId": "77b548ee-4724-4716-9b3a-95acbb8bb275" }

    And a csv with some data in it.

    $ aws s3 ls s3://athena-covid/out/77b548ee-4724-4716-9b3a-95acbb8bb275.csv

    Returning

     2021-03-06 12:57:00        312 77b548ee-4724-4716-9b3a-95acbb8bb275.csv
     2021-03-06 12:57:00        213 77b548ee-4724-4716-9b3a-95acbb8bb275.csv.metadata
    

I hope the above is illustrative of some concepts of how Athena works. All queries have an OutputLocation.

FYI...DDL Below

CREATE EXTERNAL TABLE default.tx_covid (
  data_date STRING,
  state STRING,
  positive INTEGER,
  probableCases INTEGER,
  negative INTEGER,
  pending INTEGER,
  totalTestResultsSource STRING,
  totalTestResults INTEGER,
  hospitalizedCurrently INTEGER,
  hospitalizedCumulative INTEGER,
  inIcuCurrently INTEGER,
  inIcuCumulative INTEGER,
  onVentilatorCurrently INTEGER,
  onVentilatorCumulative INTEGER,
  recovered INTEGER,
  lastUpdateEt INTEGER,
  dateModified INTEGER,
  checkTimeEt INTEGER,
  death INTEGER,
  hospitalized INTEGER,
  hospitalizedDischarged INTEGER,
  dateChecked STRING,
  totalTestsViral INTEGER,
  positiveTestsViral INTEGER,
  negativeTestsViral INTEGER,
  positiveCasesViral INTEGER,
  deathConfirmed INTEGER,
  deathProbable INTEGER,
  totalTestEncountersViral INTEGER,
  totalTestsPeopleViral INTEGER,
  totalTestsAntibody INTEGER,
  positiveTestsAntibody INTEGER,
  negativeTestsAntibody INTEGER,
  totalTestsPeopleAntibody INTEGER,
  positiveTestsPeopleAntibody INTEGER,
  negativeTestsPeopleAntibody INTEGER,
  totalTestsPeopleAntigen INTEGER,
  positiveTestsPeopleAntigen INTEGER,
  totalTestsAntigen INTEGER,
  positiveTestsAntigen INTEGER,
  fips STRING,
  positiveIncrease INTEGER,
  negativeIncrease INTEGER,
  total INTEGER,
  totalTestResultsIncrease INTEGER,
  posNeg INTEGER,
  dataQualityGrade INTEGER,
  deathIncrease INTEGER,
  hospitalizedIncrease INTEGER,
  hash STRING,
  commercialScore INTEGER,
  negativeRegularScore INTEGER,
  negativeScore INTEGER,
  positiveScore INTEGER,
  score INTEGER,
  grade INTEGER
)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ','
  ESCAPED BY '\\'
  LINES TERMINATED BY '\n'
  LOCATION 's3://athena-covid/src/'
  TBLPROPERTIES ('skip.header.line.count'='1')

Upvotes: 1

John Rotenstein
John Rotenstein

Reputation: 269400

Amazon Athena saves the output of each query in an Amazon S3 bucket. The error message is saying that Athena is unable to access this bucket.

Settings link

  • Click the Settings link at the top of the screen
  • Verify that a bucket name is shown (feel free to change it if you wish)
  • Verify in the Amazon S3 management console that a bucket of that name exists in the same region. If not, create the bucket.

Bucket name

Upvotes: 7

Related Questions