covfefe
covfefe

Reputation: 2675

How to parse column (with list data) within a DataFrame?

There is a column in a DataFrame that contains a list and I want to parse that list for the first element and replace that column with it. So for example:

col1
[elem1, elem2]
[elem3, elem4]

I want to make this:

col1
elem1
elem3

I've tried dataFrameName.withColumn("col1", explode($"col1")) but it gives me a NoSuchElementException. What's the right way to do this?

Upvotes: 0

Views: 239

Answers (1)

Leo C
Leo C

Reputation: 22449

To replace the ArrayType column col1 with its first element, explode would not be useful. You can simply replace it with $"col1"(0) (or $"col1".getItem(0)), as shown below:

import spark.implicits._
import org.apache.spark.sql.functions._

val df = Seq(
  Seq("elem1", "elem2"),
  Seq("elem3", "elem4")
).toDF("col1")

df.withColumn("col1", $"col1"(0)).show
// +-----+
// | col1|
// +-----+
// |elem1|
// |elem3|
// +-----+

Note that you may have a separate issue with the encountered NoSuchElementException, as explode-ing an ArrayType column normally wouldn't generate such an exception.

Upvotes: 1

Related Questions