Gaios
Gaios

Reputation: 13

"Path is not legal" error when loading data from S3 into external Hive table located in S3 in EMR

I have an EMR cluster that I'm running Hive on. I have an external table on S3 defined as such:

+-----------------------------------------------------------------+
| CREATE EXTERNAL TABLE `blah`(                                   |
|   `blah1` string,                                               |
|   `blah2` string)                                               |
| PARTITIONED BY (                                                |
|   `blah3` string,                                               |
|   `blah4` string,                                               |
| ROW FORMAT DELIMITED                                            |
|   FIELDS TERMINATED BY '\t'                                     |
| STORED AS INPUTFORMAT                                           |
|   'org.apache.hadoop.mapred.TextInputFormat'                    |
| OUTPUTFORMAT                                                    |
|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'  |
| LOCATION                                                        |
|   's3://REDACTED/events'                                        |
| TBLPROPERTIES (                                                 |
|   'transient_lastDdlTime'='REDACTED')                           |
+-----------------------------------------------------------------+

This table has some data that I'm able to see via SELECT statements in beeline. That data came from a transfer using s3distcp.

The issue that I'm running into is that I get the following error when I try to load data via a LOAD DATA INPATH command:

Error: Error while compiling statement: FAILED: SemanticException [Error 10028]: Line 1:17 Path is not legal ''s3://REDACTED/FILE_I_WANT_TO_LOAD'': Move from: s3://REDACTED/FILE_I_WANT_TO_LOAD to: s3://REDACTED/PARTITION_IN_TABLE is not valid. Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict. (state=42000,code=10028)

Note that the file I'm trying to load and the table I'm trying to load into exist in different buckets.

I think this issue comes down to setting S3 as the base file system for Hive but I've tried editing default.fs.name and hive.metastore.warehouse.dir to both be the S3 bucket containing the table. Doing that produced the same exact error.

Any help on this would be greatly appreciated!

Upvotes: 1

Views: 1277

Answers (2)

ChristopherB
ChristopherB

Reputation: 2068

Don't use LOAD, just define an external table for each S3 location and then use a INSERT INTO ... SELECT ... style query.

Upvotes: 1

Sandesh Deshmane
Sandesh Deshmane

Reputation: 2305

Can you try with location as s3n://

Upvotes: 0

Related Questions