j pavan kumar
j pavan kumar

Reputation: 369

Can DataFrame be accessed from different SparkSessions?

Can I access a DataFrame from different SparkSessions?

Upvotes: 3

Views: 4647

Answers (4)

Paul Z Wu
Paul Z Wu

Reputation: 575

One can "copy" a Dataset from one session to another. Suppose you have a dataset ds1 in session spark1, you want to use it in session spark2, you can do this:

Dataset ds2 = spark2.sql("select 1 dummy").join(ds1).drop("dummy")

By doing this, you have a copy of ds1 in session spark2.

Upvotes: 0

Jacek Laskowski
Jacek Laskowski

Reputation: 74679

tl;dr No, it's not possible to share a DataFrame between SparkSessions.

A DataFrame lives in one single SparkSession (just like a RDD within SparkContext) that defines its visibility scope. The owning SparkSession becomes an integral part of a DataFrame which you can see in the definition of Dataset type constructor:

class Dataset[T] private[sql](
    @transient val sparkSession: SparkSession,   // <-- here
    @DeveloperApi @InterfaceStability.Unstable @transient val queryExecution: QueryExecution,
    encoder: Encoder[T])
  extends Serializable {

You can access the SparkSession a DataFrame belongs to using sparkSession attribute:

scala> val df = Seq(1,2,3).toDF("id")
df: org.apache.spark.sql.DataFrame = [id: int]

scala> df.sparkSession
res0: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@4832813d

scala> df.sparkSession == spark
res1: Boolean = true 

Upvotes: 5

user1212
user1212

Reputation: 1

Use Global_temp_view & as it is session-scoped, do not terminate the session with which you have created the DataFrame.

Upvotes: -2

David Schuler
David Schuler

Reputation: 1031

You can always just write out the dataframe that you want to be used again, then read it in next time.

Upvotes: 1

Related Questions