Reputation: 57
I have HDInsight cluster on Azure and .csv
files in hdfs (Azure storage).
Using apache-pig I want to process these files and store the output in a hive table. To achieve this I have written following script:
A = LOAD '/test/input/t12007.csv' USING PigStorage(',') AS (year:chararray,ArrTime:chararray,DeptTime:chararray);
describe A;
dump A;
store A into 'testdb.tbl3' using org.apache.hive.hcatalog.pig.HCatStorer();
This script loads the file successfully, describe the structure and it also displays the data using dump but while store command executes it throws the following error:
2017-05-02 06:18:41,476 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Failed to parse: Pig script failed to parse: <file script.pig, line 4, column 33> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Caused by: <file script.pig, line 4, column 33> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
2017-05-02 06:18:41,484 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Upvotes: 1
Views: 1252
Reputation: 7990
pig -useHCatalog
From the Pig HCatalog documentation
Running Pig with HCatalog
Pig does not automatically pick up HCatalog
jars. To bring in the necessary jars, you can either use a flag in the pig command or set the environment variables PIG_CLASSPATH
and PIG_OPTS
as described below. To bring in the appropriate jars for working with HCatalog
, simply include the following flag in your script:
Alternate way:
Specify the location of the HCatalog
jar and add a REGISTER
statement with the path of the jar to the top of your script as below.
REGISTER /usr/username/client/lib/hive-hcatalog-core-1.2.1.2.3.0.0-2557.jar;
Your path may be different as per installation in your cluster. You can find this jar location using command: locate *hcatalog-core*
HCatStorer
HCatStorer
is used with Pig scripts to write data to HCatalog-managed
tables.
Usage
HCatStorer
is accessed via a Pig store statement.
STORE A INTO 'tablename'
USING org.apache.hive.hcatalog.pig.HCatStorer();
Upvotes: 1