Tom
Tom

Reputation: 6332

What does Dataset's as method really mean

I have simple code:

  test("Dataset as method") {
    val spark = SparkSession.builder().master("local").appName("Dataset as method").getOrCreate()
    import spark.implicits._
    //xyz is an alias of ds1
    val ds1 = Seq("1", "2").toDS().as("xyz")
    //xyz can be used to refer to the value column  
    ds1.select($"xyz.value").show(truncate = false)
    //ERROR here, no table or view named xyz
    spark.sql("select * from xyz").show(truncate = false)
  }

It looks to me that xyz is like a table name, but the sql select * from xyz raises an error complaining xyz doesn't exist.

So, I want to ask, what does as method really mean? and how I should use the alias,like xyz in my case

Upvotes: 0

Views: 73

Answers (1)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

.as() when used with dataset (as in your case) is a function to create alias for a dataset as you can see in the api doc

  /**
   * Returns a new Dataset with an alias set.
   *
   * @group typedrel
   * @since 1.6.0
   */
  def as(alias: String): Dataset[T] = withTypedPlan {
    SubqueryAlias(alias, logicalPlan)
  }

which can be used in function apis only such as select, join, filter etc. But the alias cannot be used for sql queries.

It is more evident if you create two columns dataset and use alias as you did

val ds1 = Seq(("1", "2"),("3", "4")).toDS().as("xyz")

Now you can use select to select only one column using the alias as

ds1.select($"xyz._1").show(truncate = false)

which should give you

+---+
|_1 |
+---+
|1  |
|3  |
+---+

The use of as alias is more evident when you do join of two datsets having same column names where you can write condition for joining using the alias.

But to use alias for use in sql queries you will have to register the table

ds1.registerTempTable("xyz")
spark.sql("select * from xyz").show(truncate = false)

which should give you the correct result

+---+---+
|_1 |_2 |
+---+---+
|1  |2  |
|3  |4  |
+---+---+

Or even better do it in a new way

ds1.createOrReplaceTempView("xyz")

Upvotes: 2

Related Questions