Reputation: 369
Can I access a DataFrame
from different SparkSession
s?
Upvotes: 3
Views: 4647
Reputation: 575
One can "copy" a Dataset from one session to another. Suppose you have a dataset ds1 in session spark1, you want to use it in session spark2, you can do this:
Dataset ds2 = spark2.sql("select 1 dummy").join(ds1).drop("dummy")
By doing this, you have a copy of ds1 in session spark2.
Upvotes: 0
Reputation: 74679
tl;dr No, it's not possible to share a DataFrame between SparkSessions
.
A DataFrame
lives in one single SparkSession
(just like a RDD
within SparkContext
) that defines its visibility scope. The owning SparkSession
becomes an integral part of a DataFrame
which you can see in the definition of Dataset
type constructor:
class Dataset[T] private[sql](
@transient val sparkSession: SparkSession, // <-- here
@DeveloperApi @InterfaceStability.Unstable @transient val queryExecution: QueryExecution,
encoder: Encoder[T])
extends Serializable {
You can access the SparkSession
a DataFrame
belongs to using sparkSession
attribute:
scala> val df = Seq(1,2,3).toDF("id")
df: org.apache.spark.sql.DataFrame = [id: int]
scala> df.sparkSession
res0: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@4832813d
scala> df.sparkSession == spark
res1: Boolean = true
Upvotes: 5
Reputation: 1
Use Global_temp_view & as it is session-scoped, do not terminate the session with which you have created the DataFrame.
Upvotes: -2
Reputation: 1031
You can always just write out the dataframe that you want to be used again, then read it in next time.
Upvotes: 1