zohar
zohar

Reputation: 2378

Hadoop: How to unit test FileSystem

I want to run unit test but I need to have a org.apache.hadoop.fs.FileSystem instance. Are there any mock or any other solution for creating FileSystem?

Upvotes: 18

Views: 17366

Answers (10)

Arnon Rotem-Gal-Oz
Arnon Rotem-Gal-Oz

Reputation: 25909

Take a look at the hadoop-test jar

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-test</artifactId>
    <version>0.20.205.0</version>
</dependency>

it has classes for setting up a MiniDFSCluster and MiniMRCluster so you can test without Hadoop

Upvotes: 9

Thomas Decaux
Thomas Decaux

Reputation: 22671

My solution is to create a DummyFileSystem`` that extends abstract HadoopFileSystem`, so I can fake if a file exists or not, etc. Example of "all file exists":

@Override
public FileStatus getFileStatus(Path f) throws IOException {
    return new FileStatus(10, false, 3, 128*1024*1024,1,1, null, null, null, f);
}

I found easier to keep control on faked data.

Upvotes: 1

Driss NEJJAR
Driss NEJJAR

Reputation: 978

I tried Thirupathi Chavati and Alexander Tokarev solutions with sbt, and :

import org.apache.hadoop.hdfs.MiniDFSCluster

will only work by adding:

libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.8.1" classifier "tests" libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.8.1" classifier "tests"

Upvotes: 1

Thirupathi Chavati
Thirupathi Chavati

Reputation: 1861

add below dependency

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-minicluster</artifactId>
        <version>2.7.3</version>
       <!-- <scope>test</scope>-->
    </dependency>

add the below code, it will create the FileSysetm.

import java.nio.file.{Files, Paths}


import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.hdfs.MiniDFSCluster


object MiniClusterDemo extends App {
  def sysDir: String = System.getProperty("user.dir")

  if(miniCluster!=null) println("Cluster created and active") else println("something went wrong")

  def miniCluster: FileSystem = {
    val basePath = Paths.get(s"$sysDir")

    val baseDir = Files.createTempDirectory(basePath,"hdfs_test").toFile.getAbsoluteFile
    val conf = new Configuration()
   conf.set(MiniDFSCluster.HDFS_MINIDFS_BASEDIR, baseDir.getAbsolutePath)
    val hdfsCluster = new MiniDFSCluster.Builder(conf).build()
    val hdfsURI = s"hdfs://localhost:${hdfsCluster.getNameNodePort}/"
    val fileSystem = hdfsCluster.getFileSystem
    //hdfsCluster.shutdown();
    //FileUtil.fullyDelete(baseDir);
    fileSystem
  }

}

See the sample logs after creation of MiniCluster

enter image description here

Upvotes: 0

yishaiz
yishaiz

Reputation: 2583

You can use HBaseTestingUtility:

public class SomeTest {
    private HBaseTestingUtility testingUtil = new HBaseTestingUtility();

    @Before
    public void setup() throws Exception {
        testingUtil.startMiniDFSCluster(1);
    }

    @After
    public void tearDown() throws IOException {
        testingUtil.shutdownMiniDFSCluster();
    }

    @Test
    public void test() throws Exception {
        DistributedFileSystem fs = testingUtil.getDFSCluster().getFileSystem();
        final Path dstPath = new Path("/your/path/file.txt);
        final Path srcPath = new Path(SomeTest.class.getResource("file.txt").toURI());
        fs.copyFromLocalFile(srcPath, dstPath);
        ...
    }
}

Upvotes: 0

Art
Art

Reputation: 1340

You might want to take a look at RawLocalFileSystem. Though I think you'd better just mock it.

Upvotes: 0

Alexander Tokarev
Alexander Tokarev

Reputation: 2763

If you're using hadoop 2.0.0 and above - consider using a hadoop-minicluster

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-minicluster</artifactId>
    <version>2.5.0</version>
    <scope>test</scope>
</dependency>

With it, you can create a temporary hdfs on your local machine, and run your tests on it. A setUp method may look like this:

baseDir = Files.createTempDirectory("test_hdfs").toFile().getAbsoluteFile();
Configuration conf = new Configuration();
conf.set(MiniDFSCluster.HDFS_MINIDFS_BASEDIR, baseDir.getAbsolutePath());
MiniDFSCluster.Builder builder = new MiniDFSCluster.Builder(conf);
hdfsCluster = builder.build();

String hdfsURI = "hdfs://localhost:"+ hdfsCluster.getNameNodePort() + "/";
DistributedFileSystem fileSystem = hdfsCluster.getFileSystem();

And in a tearDown method you should shut down your mini hdfs cluster, and remove temporary directory.

hdfsCluster.shutdown();
FileUtil.fullyDelete(baseDir);

Upvotes: 26

IceBox13
IceBox13

Reputation: 1358

Why not use a mocking framework like Mockito or PowerMock to mock your interations with the FileSystem? Your unit tests should not depend on an actual FileSystem, but should just be verifying behavior in your code in interacting with the FileSystem.

Upvotes: 4

zohar
zohar

Reputation: 2378

What I have done (until I will find better solution) I extended the FileSystem.

Upvotes: 1

fyr
fyr

Reputation: 20859

One possible way would be to use TemporaryFolder in Junit 4.7.

See.: http://www.infoq.com/news/2009/07/junit-4.7-rules or http://weblogs.java.net/blog/johnsmart/archive/2009/09/29/working-temporary-files-junit-47.

Upvotes: 2

Related Questions