Failing to convert a dataframe to a dataset of object with an enumeration custom field

Question

I am facing an issue when trying to convert a dataframe to a dataset of objects with a custom field.

In this code, I have a dataframe with two columns, country, and currency. I want to convert this into a dataset using the MyObj case class where the country is a string and currency is an enumeration.

Here is the code:

val schema = StructType(Seq(
  StructField("country", StringType),
  StructField("currency", StringType)
))

// Define the sample data
val data = Seq(
  ("France", "EUR"),
  ("USA", "DOLLAR"),
  ("Germany", "EUR")
)

// Create a DataFrame from the sample data
val df = sparkSession.createDataFrame(data).toDF(schema.fieldNames: _*)

class Currency extends Enumeration {
  type Currency = Value
  val EUR = Value("EUR")
  val DOLLAR = Value("DOLLAR")
}

case class MyObj(country: String, currency: Currency)

val dsProduct = df.as[MyObj](Encoders.product[MyObj])

Here is the error I face when executing the program:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Try to map struct to Tuple1, but failed as the number of fields does not line up.

If I change the currency type to a string, it works just fine, but I want to keep it as an enumeration for another use case.

Any idea how can I fix that?

Failing to convert a dataframe to a dataset of object with an enumeration custom field

Answers (1)

Related Questions