Reputation: 4160
I'm having a very strange problem. I'm using dfs-datastores Pail abstraction to write data to HDFS in Java. I don't think the Pail piece is important to the problem though.
When it calls org.apache.hadoop.fs.FileSystem getFS(java.lang.String path) with a path on my local filesystem it pauses for about 2 minutes seemingly doing nothing then returns. This is on my laptop.
The weird thing is that it worked really fast when I was on the network at my office today, but now that I'm home it's doing it again. I'm running Ubuntu 10.10 64-bit with Java 1.7.
Anyone have any ideas what it's doing? What could be different between being at work and being at home?
UPDATE: I've been stepping through code with the debugger and it seems to be having trouble in Configuration.loadResource(). It's calling that multiple times and it will take 5-10 seconds to return from that function.
UPDATE2: I've narrowed this down a little further. The biggest hang up seems to be when it calls KerberosName.setConfiguration(). Which would explain why it runs fast at work since the Active Directory acts as a Kerberos server. I don't have one here at home, so it can't find one. Now they question is why in the world it's trying to load the Java Kerberos stuff.
Upvotes: 1
Views: 430
Reputation: 4160
I found a solution (or at least a work around). I installed the krb5-kdc package and now my little program runs fast without any unexplained pauses. After this I removed krb5-kdc, tested and it was still running fast. I removed /etc/krb5.conf and it started doing the pause again. It looks like using the Hadoop library on Ubuntu (at least) requires a /etc/krb5.conf file.
Maybe this will help someone else.
Upvotes: 1