Reputation: 21
I'm newbie on hadoop.
I heard that mapR is better way to mount hadoop HDFS rather than fuse.
But most of the related article just describe about mapR hadoop not pure apache hadoop.
Anyone has experience of mounting pure apache hadoop with mapR?
Thanks in advance.
Upvotes: 0
Views: 641
Reputation: 9
To sum up what Ted said as well,
You're not really "mounting pure apache hadoop with mapR?". Hadoop shouldn't be confused with HDFS. While they tend to be interchangeable during conversation, HDFS explicitly refers to the actual distributed filesystem (hence the DFS in HDFS). HDFS has to be interacted with using specific hadoop commands, i.e. "hadoop dfs ls /" will list the root contents of hdfs.
MapR went above and beyond what hadoop provides you be default. One, you can interact with the filesystem using the more efficient maprfs (a rewrite of hdfs). The other thing you can do is actually NFS mount the HDFS/MapRFS so that you can manipulate the filesystem natively without having to do anything special. It gets treated like any other NFS filesystem, except in this case, it's distributed across your cluster.
Upvotes: 0
Reputation: 1907
MapR is much more than just a way to mount HDFS.
MapR includes Hadoop and many Apache eco-system components and many other non-Apache components such as Cascading. It also includes LucidWorks which includes Solr.
MapR also includes a reimplementation of HDFS called MaprFS. MaprFS has higher performance, has read-write semantics, allows read during write, supports transactionally correct mirrors and snapshots, has no name node, scales without the futzing of federation, is inherently HA without all the mess of the HA NameNode and which is accessible via a distributed NFS system.
Oh, MaprFS also supports the HBase API in addition to POSIX-ish access via NFS and in addition to the HDFS API.
The map-reduce layer in MapR has been partially re-written to make use of the extremely high performance capabilities of the file system. This is how MapR was able to break the minute sort record last fall.
So naming aside, MapR includes all the open source software that you would get with any other distribution and much more besides. "Pure Hadoop" is next to useless. You need Pig and/or Hive. You probably should look into Cascading/Scalding. You may need Mahout. You definitely will need to connect your system to legacy data sources and reporting systems which is what NFS makes easy.
Keep in mind that mounting HDFS via NFS or Fuze doesn't get you where you want to be. HDFS just doesn't have suitable semantics for access via NFS or normal file system API's. It just has too many compromises.
With MapR, on the other hand, you can even run databases like MySQL or Postgress on top of the clusters file system via NFS.
MapR comes in three editions.
M3 is free and gives you all the performance and scalability, but limits you to a single NFS server and no mirrors, snapshots, volume locality or HBase compatible API (you can run HBase itself, of course). HA is also degraded in M3 so that it takes an hour to fail over certain functions.
M5 costs money after the free trial period and gives you snapshots, mirrors, the ability to force some data to different topologies and unlimited NFS servers.
M7 also costs money and adds the HBase API to all that M5 can do.
See mapr.com for more info.
Upvotes: 0