Reputation:
I am trying to do twitter analysis using Flume and Hive . For getting the tweets from the twitter i have set all the required params (consumerKey, consumerSecret, accessToken and accessTokenSecret) in the flume.conf files .
TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS TwitterAgent.sources.Twitter.type =
com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <consumerKey>
TwitterAgent.sources.Twitter.consumerSecret = <consumerSecret>
TwitterAgent.sources.Twitter.accessToken = <accessToken>
TwitterAgent.sources.Twitter.accessTokenSecret = <accessTokenSecret
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics,
bigdata, cloudera, data science, data scientiest, business
intelligence, mapreduce, data warehouse, data warehousing, mahout,
hbase, nosql, newsql, businessintelligence, cloudcomputing
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs TwitterAgent.sinks.HDFS.hdfs.path
= hdfs://localhost:9000/user/flume/tweets/ TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
I have set the class path for the flume tar ball and the flume source snapshot jar file using bash rc .
export FLUME_HOME=/home/students/apache-flume-1.4.0-bin
export FLUME_SRC=/home/students/flume-sources-1.0-SNAPSHOT.jar
export PATH=$FLUME_HOME/bin:$FLUME_SRC/bin:$PATH
When i run the flume agent
flume-ng agent --conf-file twitter_flume.conf --name TwitterAgent -Dflume.root.logger=INFO,console -n TwitterAgent
i can see the below log trace and nothing happens
15/06/23 23:41:55 INFO source.DefaultSourceFactory: Creating instance
of source Twitter, type com.cloudera.flume.source.TwitterSource
15/06/23 23:41:55 ERROR
node.PollingPropertiesFileConfigurationProvider: Failed to load
configuration data. Exception follows.
org.apache.flume.FlumeException: Unable to load source type:
com.cloudera.flume.source.TwitterSource, class:
com.cloudera.flume.source.TwitterSource at
org.apache.flume.source.DefaultSourceFactory.getClass(DefaultSourceFactory.java:67)
at
org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:40)
at
org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:327)
at
org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102)
at
org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744) Caused by:
java.lang.ClassNotFoundException:
com.cloudera.flume.source.TwitterSource at
java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
java.lang.ClassLoader.loadClass(ClassLoader.java:358) at
java.lang.Class.forName0(Native Method) at
java.lang.Class.forName(Class.java:190) at
org.apache.flume.source.DefaultSourceFactory.getClass(DefaultSourceFactory.java:65)
... 11 more
May i know why this error is thrown when i set the flume source.jar already.Please help me out on this .
Upvotes: 1
Views: 6418
Reputation: 1262
1. Here is file /usr/lib/flume-ng/conf/flume.conf:
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type= com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken = xxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.keywords = Hadoop,BigData
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://quickstart.cloudera:8020/user/cloudera/flume/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
2. Rename the below flume-env.sh.template file as flume-env.sh
~]$ sudo cp /usr/lib/flume-ng/conf/flume-env.sh.template /usr/lib/flume-ng/conf/flume-env.sh
3. Set JAVA_HOME and FLUME_CLASSPATH in flume-env.sh file as:
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
FLUME_CLASSPATH="/usr/lib/flume-ng/lib/flume-sources-1.0-SNAPSHOT.jar"
4. If you don't find "/usr/lib/flume-ng/lib/flume-sources-1.0-SNAPSHOT.jar" on your system then download the apache-flume-1.6.0-bin from google and copy lib folder of this to current lib folder.
Make sure that flume-sources-1.0-SNAPSHOT.jar file should be available in lib folder.
4.1. Rename old lib folder
4.2. Download and put on cloudera desktop and do the following:
~]$ sudo mv /usr/lib/flume-ng/lib /usr/lib/flume-ng/lib_cloudera
~]$ sudo mv /home/cloudera/Desktop/apache-flume-1.6.0-bin/lib /usr/lib/flume-ng/lib
5. Now run Flume Agent Command:
~]$ flume-ng agent --conf-file /usr/lib/flume-ng/conf/flume.conf --name TwitterAgent -Dflume.root.logger=INFO,console -n TwitterAgent
This should run successfully. All the Best.
Upvotes: 1
Reputation: 153
Sorry, it actually works but make sure that you have all the jars in your flume/lib. Follow all the steps in: http://bigdatanalysis.blogspot.com.es/2014/02/collecting-tweets-in-hadoop-using-flume.html
Upvotes: 0
Reputation: 153
I think com.cloudera.flume.source.TwitterSource
is not longer working. Try with org.apache.flume.source.twitter.TwitterSource
Upvotes: 0
Reputation: 3798
You did not set the classpath, but the PATH (which is used for finding executable binaries, not Java .jar files).
You can set FLUME_CLASSPATH variable in the flume-env.sh file at your Flume conf directory; or adding the -classpath <path/to/the/jar>
option on command line.
Upvotes: 1