Satya P
Satya P

Reputation: 1057

Can we use Apache Spark to store Data? or is it only a Data processing tool?

I am new to Apache Spark, I would like to know is it possible to store data using Apache Spark. Or is it only a processing tool?

Thanks for spending your time, Satya

Upvotes: 5

Views: 12412

Answers (3)

jarasss
jarasss

Reputation: 528

Spark is not a database so it cannot "store data". It processes data and stores it temporarily in memory, but that's not presistent storage.

In real life use-case you usually have database, or data repository frome where you access data from spark.

Spark can access data that's in:

  • SQL Databases (Anything that can be connected using JDBC driver)
  • Local files
  • Cloud storage (eg. Amazon S3)
  • NoSQL databases.
  • Hadoop File System (HDFS)
  • and many more...

Detailed description can be found here: http://spark.apache.org/docs/latest/sql-programming-guide.html#sql

Upvotes: 4

Alberto Bonsanto
Alberto Bonsanto

Reputation: 18042

As you can read in Wikipedia, Apache Spark is defined as:

is an open source cluster computing framework

When we refer about computing, it's related to a processing tool, in essence it allows to work as a pipeline scheme (or somehow ETL), you read the dataset, you process the data, and then you store the data processed, or models that describe the data.

If your main objective is to distribute your data, there are some good alternatives like HDFS (Hadoop File System), and others.

Upvotes: 0

Durga Viswanath Gadiraju
Durga Viswanath Gadiraju

Reputation: 3956

Apache Spark is primarily processing engine. It works with underlying file systems such as HDFS, s3 and other supported file systems. It has capabilities to read the data from relational databases as well. But primarily it is in memory distributed processing tool.

Upvotes: 0

Related Questions