Reputation: 1421
I am trying to run a simple NaiveBayesClassifer
using hadoop, getting this error
Exception in thread "main" java.io.IOException: No FileSystem for scheme: file
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at org.apache.mahout.classifier.naivebayes.NaiveBayesModel.materialize(NaiveBayesModel.java:100)
Code :
Configuration configuration = new Configuration();
NaiveBayesModel model = NaiveBayesModel.materialize(new Path(modelPath), configuration);// error in this line..
modelPath
is pointing to NaiveBayes.bin
file, and configuration object is printing - Configuration: core-default.xml, core-site.xml
I think its because of jars, any ideas?
Upvotes: 112
Views: 141897
Reputation: 136
If you're using the Gradle Shadow plugin, then this is the config you have to add:
shadowJar {
mergeServiceFiles()
}
Upvotes: 3
Reputation: 155
This question is old, but I faced the same issue recently and the origin of the error was different than those of the answers here.
On my side, the root cause was due to hdfs trying to parse an authorithy when encountering //
at the beginning of a path :
$ hdfs dfs -ls //dev
ls: No FileSystem for scheme: null
So try to look for a double slash or an empty variable in the path building part of your code.
Related Hadoop ticket: https://issues.apache.org/jira/browse/HADOOP-8087
Upvotes: 1
Reputation: 6516
This is a typical case of the maven-assembly
plugin breaking things.
Different JARs (hadoop-commons
for LocalFileSystem
, hadoop-hdfs
for DistributedFileSystem
) each contain a different file called org.apache.hadoop.fs.FileSystem
in their META-INFO/services
directory. This file lists the canonical classnames of the filesystem implementations they want to declare (This is called a Service Provider Interface implemented via java.util.ServiceLoader
, see org.apache.hadoop.FileSystem#loadFileSystems
).
When we use maven-assembly-plugin
, it merges all our JARs into one, and all META-INFO/services/org.apache.hadoop.fs.FileSystem
overwrite each-other. Only one of these files remains (the last one that was added). In this case, the FileSystem
list from hadoop-commons
overwrites the list from hadoop-hdfs
, so DistributedFileSystem
was no longer declared.
After loading the Hadoop configuration, but just before doing anything FileSystem
-related, we call this:
hadoopConfig.set("fs.hdfs.impl",
org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()
);
hadoopConfig.set("fs.file.impl",
org.apache.hadoop.fs.LocalFileSystem.class.getName()
);
It has been brought to my attention by krookedking
that there is a configuration-based way to make the maven-assembly
use a merged version of all the FileSystem
services declarations, check out his answer below.
Upvotes: 196
Reputation: 822
This is not related to Flink, but I've found this issue in Flink also.
For people using Flink, you need to download Pre-bundled Hadoop and put it inside /opt/flink/lib
.
Upvotes: 2
Reputation: 361
I faced the same problem. I found two solutions: (1) Editing the jar file manually:
Open the jar file with WinRar (or similar tools). Go to Meta-info > services , and edit "org.apache.hadoop.fs.FileSystem" by appending:
org.apache.hadoop.fs.LocalFileSystem
(2) Changing the order of my dependencies as follow
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.2.1</version>
</dependency>
</dependencies>
Upvotes: 3
Reputation: 33
For SBT use below mergeStrategy in build.sbt
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => {
case PathList("META-INF", "services", "org.apache.hadoop.fs.FileSystem") => MergeStrategy.filterDistinctLines
case s => old(s)
}
}
Upvotes: 1
Reputation: 444
I also came across similar issue. Added core-site.xml and hdfs-site.xml as resources of conf (object)
Configuration conf = new Configuration(true);
conf.addResource(new Path("<path to>/core-site.xml"));
conf.addResource(new Path("<path to>/hdfs-site.xml"));
Also edited version conflicts in pom.xml. (e.g. If configured version of hadoop is 2.8.1, but in pom.xml file, dependancies has version 2.7.1, then change that to 2.8.1) Run Maven install again.
This solved error for me.
Upvotes: -1
Reputation: 11
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://nameNode:9000");
FileSystem fs = FileSystem.get(conf);
set fs.defaultFS works for me! Hadoop-2.8.1
Upvotes: 1
Reputation: 11
If you are using sbt:
//hadoop
lazy val HADOOP_VERSION = "2.8.0"
lazy val dependenceList = Seq(
//hadoop
//The order is important: "hadoop-hdfs" and then "hadoop-common"
"org.apache.hadoop" % "hadoop-hdfs" % HADOOP_VERSION
,"org.apache.hadoop" % "hadoop-common" % HADOOP_VERSION
)
Upvotes: 0
Reputation: 86188
It took me sometime to figure out fix from given answers, due to my newbieness. This is what I came up with, if anyone else needs help from the very beginning:
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object MyObject {
def main(args: Array[String]): Unit = {
val mySparkConf = new SparkConf().setAppName("SparkApp").setMaster("local[*]").set("spark.executor.memory","5g");
val sc = new SparkContext(mySparkConf)
val conf = sc.hadoopConfiguration
conf.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
conf.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)
I am using Spark 2.1
And I have this part in my build.sbt
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
Upvotes: 1
Reputation: 121
Took me ages to figure it out with Spark 2.0.2, but here's my bit:
val sparkBuilder = SparkSession.builder
.appName("app_name")
.master("local")
// Various Params
.getOrCreate()
val hadoopConfig: Configuration = sparkBuilder.sparkContext.hadoopConfiguration
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
hadoopConfig.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)
And the relevant parts of my build.sbt
:
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.2"
I hope this can help!
Upvotes: 12
Reputation: 28646
Another possible cause (though the OPs question doesn't itself suffer from this) is if you create a configuration instance that does not load the defaults:
Configuration config = new Configuration(false);
If you don't load the defaults then you won't get the default settings for things like the FileSystem
implementations which leads to identical errors like this when trying to access HDFS. Switching to the parameterless constructor of passing in true
to load defaults may resolve this.
Additionally if you are adding custom configuration locations (e.g. on the file system) to the Configuration
object be careful of which overload of addResource()
you use. For example if you use addResource(String)
then Hadoop assumes that the string is a class path resource, if you need to specify a local file try the following:
File configFile = new File("example/config.xml");
config.addResource(new Path("file://" + configFile.getAbsolutePath()));
Upvotes: 2
Reputation: 1607
Use this plugin
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>1.5</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>allinone</shadedClassifierName>
<artifactSet>
<includes>
<include>*:*</include>
</includes>
</artifactSet>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>reference.conf</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer">
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
Upvotes: 0
Reputation: 389
For maven, just add the maven dependency for hadoop-hdfs (refer to the link below) will solve the issue.
http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs/2.7.1
Upvotes: 7
Reputation: 2303
For those using the shade plugin, following on david_p's advice, you can merge the services in the shaded jar by adding the ServicesResourceTransformer to the plugin config:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
This will merge all the org.apache.hadoop.fs.FileSystem services in one file
Upvotes: 78
Reputation: 347
thanks david_p,scala
conf.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName);
conf.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName);
or
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>
Upvotes: 9
Reputation: 91
For the record, this is still happening in hadoop 2.4.0. So frustrating...
I was able to follow the instructions in this link: http://grokbase.com/t/cloudera/scm-users/1288xszz7r/no-filesystem-for-scheme-hdfs
I added the following to my core-site.xml and it worked:
<property>
<name>fs.file.impl</name>
<value>org.apache.hadoop.fs.LocalFileSystem</value>
<description>The FileSystem for file: uris.</description>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
<description>The FileSystem for hdfs: uris.</description>
</property>
Upvotes: 9
Reputation: 12965
I use sbt assembly to package my project. I also meet this problem. My solution is here. Step1: add META-INF mergestrategy in your build.sbt
case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
case PathList("META-INF", ps @ _*) => MergeStrategy.first
Step2: add hadoop-hdfs lib to build.sbt
"org.apache.hadoop" % "hadoop-hdfs" % "2.4.0"
Step3: sbt clean; sbt assembly
Hope the above information can help you.
Upvotes: 6
Reputation: 313
Assuming that you are using mvn and cloudera distribution of hadoop. I'm using cdh4.6 and adding these dependencies worked for me.I think you should check the versions of hadoop and mvn dependencies.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>2.0.0-mr1-cdh4.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.0.0-cdh4.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.0.0-cdh4.6.0</version>
</dependency>
don't forget to add cloudera mvn repository.
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
Upvotes: 5
Reputation: 13046
I assume you build sample using maven.
Please check content of the JAR you're trying to run. Especially META-INFO/services
directory, file org.apache.hadoop.fs.FileSystem
. There should be list of filsystem implementation classes. Check line org.apache.hadoop.hdfs.DistributedFileSystem
is present in the list for HDFS and org.apache.hadoop.fs.LocalFileSystem
for local file scheme.
If this is the case, you have to override referred resource during the build.
Other possibility is you simply don't have hadoop-hdfs.jar
in your classpath but this has low probability. Usually if you have correct hadoop-client
dependency it is not an option.
Upvotes: 2