Reputation: 13
I have an EMR cluster that I'm running Hive on. I have an external table on S3 defined as such:
+-----------------------------------------------------------------+
| CREATE EXTERNAL TABLE `blah`( |
| `blah1` string, |
| `blah2` string) |
| PARTITIONED BY ( |
| `blah3` string, |
| `blah4` string, |
| ROW FORMAT DELIMITED |
| FIELDS TERMINATED BY '\t' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 's3://REDACTED/events' |
| TBLPROPERTIES ( |
| 'transient_lastDdlTime'='REDACTED') |
+-----------------------------------------------------------------+
This table has some data that I'm able to see via SELECT
statements in beeline. That data came from a transfer using s3distcp.
The issue that I'm running into is that I get the following error when I try to load data via a LOAD DATA INPATH
command:
Error: Error while compiling statement: FAILED: SemanticException [Error 10028]: Line 1:17 Path is not legal ''s3://REDACTED/FILE_I_WANT_TO_LOAD'': Move from: s3://REDACTED/FILE_I_WANT_TO_LOAD to: s3://REDACTED/PARTITION_IN_TABLE is not valid. Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict. (state=42000,code=10028)
Note that the file I'm trying to load and the table I'm trying to load into exist in different buckets.
I think this issue comes down to setting S3 as the base file system for Hive but I've tried editing default.fs.name
and hive.metastore.warehouse.dir
to both be the S3 bucket containing the table. Doing that produced the same exact error.
Any help on this would be greatly appreciated!
Upvotes: 1
Views: 1277
Reputation: 2068
Don't use LOAD, just define an external table for each S3 location and then use a INSERT INTO ... SELECT ... style query.
Upvotes: 1