Dave Kincaid
Dave Kincaid

Reputation: 4160

Hadoop FileSystem.getFS() pauses for about 2 minutes

I'm having a very strange problem. I'm using dfs-datastores Pail abstraction to write data to HDFS in Java. I don't think the Pail piece is important to the problem though.

When it calls org.apache.hadoop.fs.FileSystem getFS(java.lang.String path) with a path on my local filesystem it pauses for about 2 minutes seemingly doing nothing then returns. This is on my laptop.

The weird thing is that it worked really fast when I was on the network at my office today, but now that I'm home it's doing it again. I'm running Ubuntu 10.10 64-bit with Java 1.7.

Anyone have any ideas what it's doing? What could be different between being at work and being at home?

UPDATE: I've been stepping through code with the debugger and it seems to be having trouble in Configuration.loadResource(). It's calling that multiple times and it will take 5-10 seconds to return from that function.

UPDATE2: I've narrowed this down a little further. The biggest hang up seems to be when it calls KerberosName.setConfiguration(). Which would explain why it runs fast at work since the Active Directory acts as a Kerberos server. I don't have one here at home, so it can't find one. Now they question is why in the world it's trying to load the Java Kerberos stuff.

Upvotes: 1

Views: 430

Answers (1)

Dave Kincaid
Dave Kincaid

Reputation: 4160

I found a solution (or at least a work around). I installed the krb5-kdc package and now my little program runs fast without any unexplained pauses. After this I removed krb5-kdc, tested and it was still running fast. I removed /etc/krb5.conf and it started doing the pause again. It looks like using the Hadoop library on Ubuntu (at least) requires a /etc/krb5.conf file.

Maybe this will help someone else.

Upvotes: 1

Related Questions