Reputation: 176
I just streaming some twitter data using flume and cluster it into HDFS now I try to load it into pig for analysis.As the default JsonLoader function can not load the data so I search in google for some library which can load this kind of data.I found this link and follow there instruction.
Here are the result
REGISTER '/home/hduser/Downloads/json-simple-1.1.1.jar';
2016-02-22 20:54:46,539 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
same for other tow command.
Now when I try to load my data using this command
load_tweets = LOAD '/TwitterData/' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
It's shows me this error
2016-02-22 20:58:01,639 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve com.twitter.elephantbird.pig.load.JsonLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Details at logfile: /home/hduser/pig-0.15.0/pig_1456153061619.log
so how to solve it and load properly?
Note:My data is about recent release movie deadpool twitter data.
Upvotes: 2
Views: 1352
Reputation: 1002
You need to Register 3 Jar files as shown in the blog. Each jar has its own importance.
elephant-bird-hadoop-compat-4.1.jar-Utilities for dealing with Hadoop incompatibilities between 1.x and 2.x.
elephant-bird-pig-4.1.jar--Json loader for pig, it loads each Json record into Pig.
json-simple-1.1.1.jar--One of the Json Parser available in Java
After Registering the Jars, you can load the tweets by the following pig script.
load_tweets = LOAD '/user/flume/tweets/' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
After loading the tweets, you can see them by dumping it
dump load_tweets
Upvotes: 0
Reputation: 101
You need to register below jar in pig, this jar contains the appropriate class which you are trying to access.
elephant-bird-pig-4.1.jar
EDITED: For proper steps.
REGISTER '/home/hdfs/json-simple-1.1.jar';
REGISTER '/home/hdfs/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/home/hdfs/elephant-bird-pig-4.1.jar';
load_tweets = LOAD '/user/hdfs/twittes.txt' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
dump load_tweets;
I used above steps on my local cluster and its working fine, so you need to add these jars before running your load.
Upvotes: 2