Reputation: 1057
I am new to Apache Spark, I would like to know is it possible to store data using Apache Spark. Or is it only a processing tool?
Thanks for spending your time, Satya
Upvotes: 5
Views: 12412
Reputation: 528
Spark is not a database so it cannot "store data". It processes data and stores it temporarily in memory, but that's not presistent storage.
In real life use-case you usually have database, or data repository frome where you access data from spark.
Spark can access data that's in:
Detailed description can be found here: http://spark.apache.org/docs/latest/sql-programming-guide.html#sql
Upvotes: 4
Reputation: 18042
As you can read in Wikipedia, Apache Spark is defined as:
is an open source cluster computing framework
When we refer about computing
, it's related to a processing tool, in essence it allows to work as a pipeline scheme (or somehow ETL), you read the dataset, you process the data, and then you store the data processed, or models that describe the data.
If your main objective is to distribute your data, there are some good alternatives like HDFS (Hadoop File System), and others.
Upvotes: 0
Reputation: 3956
Apache Spark is primarily processing engine. It works with underlying file systems such as HDFS, s3 and other supported file systems. It has capabilities to read the data from relational databases as well. But primarily it is in memory distributed processing tool.
Upvotes: 0