user13879368
user13879368

Reputation:

What is the differnce between HDFS and ADLS?

I am confused about how azure data lake store in different from HDFS. Can anyone pls explain it in simple terms ?

Upvotes: 1

Views: 7792

Answers (3)

Ankur Napa
Ankur Napa

Reputation: 1

  1. HDfs is not persistent but azure ADLS is. which means if i will shut down the hdfs i will loose the data because storage and compute is tighltly coupled in hdfs.
  2. in HDFS we can not access data from one cluster to another
  3. data is stored as block in HDFS and data is stored as objects in Azure

Upvotes: 0

Yavar
Yavar

Reputation: 11931

ADLS can be thought of as Microsoft managed HDFS. So essentially, instead of setting up your own HDFS on Azure you can use their managed service (without modifying any of your analytics or downstream code)

Upvotes: 1

Venkataraman R
Venkataraman R

Reputation: 12989

  • HDFS is a file system. HDFS stands for Hadoop Distributed File system. It is part of Apache Hadoop eco system. Read more on HDFS

  • ADLS is a Azure storage offering from Microsoft. ADLS stands for Azure Data Lake Storage. It provides distributed storage file format for bulk data processing needs.

    • ADLS is having internal distributed file system format called Azure Blob File System(ABFS). In addition, it also provides similar file system interface API like Hadoop to address files and directories inside ADLS using URI scheme. This way, it is easier for applications using HDFS to migrate to ADLS without code changes. For clients, accessing HDFS using HDFS driver, similar experience is got by accessing ADLS using ABFS driver.

Azure Data Lake Storage Gen2 URI

The Hadoop Filesystem driver that is compatible with Azure Data Lake Storage Gen2 is known by its scheme identifier abfs (Azure Blob File System). Consistent with other Hadoop Filesystem drivers, the ABFS driver employs a URI format to address files and directories within a Data Lake Storage Gen2 capable account.

More on Azure Data Lake Storage

Hadoop compatible access: Data Lake Storage Gen2 allows you to manage and access data just as you would with a Hadoop Distributed File System (HDFS). The new ABFS driver is available within all Apache Hadoop environments, including Azure HDInsight, Azure Databricks, and Azure Synapse Analytics to access data stored in Data Lake Storage Gen2.

UPDATE also, read about Hadoop Compliant File System(HCFS) which ensures that distributed file system (like Azure Blob Storage) API meets set of requirements to satisfy working with Apache Hadoop ecosystem, similar to HDFS. More on HCFS

Upvotes: 7

Related Questions