how to load twitter data from hdfs using pig?

Question

I just streaming some twitter data using flume and cluster it into HDFS now I try to load it into pig for analysis.As the default JsonLoader function can not load the data so I search in google for some library which can load this kind of data.I found this link and follow there instruction.

Here are the result

REGISTER '/home/hduser/Downloads/json-simple-1.1.1.jar';

2016-02-22 20:54:46,539 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS

same for other tow command.

Now when I try to load my data using this command

load_tweets = LOAD '/TwitterData/' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;

It's shows me this error

2016-02-22 20:58:01,639 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve com.twitter.elephantbird.pig.load.JsonLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Details at logfile: /home/hduser/pig-0.15.0/pig_1456153061619.log

so how to solve it and load properly?

Note:My data is about recent release movie deadpool twitter data.

Jyadav · Accepted Answer

You need to register below jar in pig, this jar contains the appropriate class which you are trying to access.

elephant-bird-pig-4.1.jar

EDITED: For proper steps.

REGISTER '/home/hdfs/json-simple-1.1.jar';

REGISTER '/home/hdfs/elephant-bird-hadoop-compat-4.1.jar';

REGISTER '/home/hdfs/elephant-bird-pig-4.1.jar';

load_tweets = LOAD '/user/hdfs/twittes.txt' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;

dump load_tweets;

I used above steps on my local cluster and its working fine, so you need to add these jars before running your load.

how to load twitter data from hdfs using pig?

Answers (2)

Related Questions