MD. Rejaul Hasan
MD. Rejaul Hasan

Reputation: 176

how to load twitter data from hdfs using pig?

I just streaming some twitter data using flume and cluster it into HDFS now I try to load it into pig for analysis.As the default JsonLoader function can not load the data so I search in google for some library which can load this kind of data.I found this link and follow there instruction.

Here are the result

REGISTER '/home/hduser/Downloads/json-simple-1.1.1.jar';

2016-02-22 20:54:46,539 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS

same for other tow command.

Now when I try to load my data using this command

load_tweets = LOAD '/TwitterData/' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;

It's shows me this error

2016-02-22 20:58:01,639 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve com.twitter.elephantbird.pig.load.JsonLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Details at logfile: /home/hduser/pig-0.15.0/pig_1456153061619.log

so how to solve it and load properly?

Note:My data is about recent release movie deadpool twitter data.

Upvotes: 2

Views: 1352

Answers (2)

Kiran Krishna Innamuri
Kiran Krishna Innamuri

Reputation: 1002

You need to Register 3 Jar files as shown in the blog. Each jar has its own importance.

elephant-bird-hadoop-compat-4.1.jar-Utilities for dealing with Hadoop incompatibilities between 1.x and 2.x.

elephant-bird-pig-4.1.jar--Json loader for pig, it loads each Json record into Pig.

json-simple-1.1.1.jar--One of the Json Parser available in Java

After Registering the Jars, you can load the tweets by the following pig script.

load_tweets = LOAD '/user/flume/tweets/' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;

After loading the tweets, you can see them by dumping it

dump load_tweets

Upvotes: 0

Jyadav
Jyadav

Reputation: 101

You need to register below jar in pig, this jar contains the appropriate class which you are trying to access.

elephant-bird-pig-4.1.jar

EDITED: For proper steps.

REGISTER '/home/hdfs/json-simple-1.1.jar';

REGISTER '/home/hdfs/elephant-bird-hadoop-compat-4.1.jar';

REGISTER '/home/hdfs/elephant-bird-pig-4.1.jar';

load_tweets = LOAD '/user/hdfs/twittes.txt' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;

dump load_tweets;

I used above steps on my local cluster and its working fine, so you need to add these jars before running your load.

Upvotes: 2

Related Questions