Reputation: 502

How to get first value of WrappedArray in Spark?

I grouped by few columns and am getting WrappedArray out of these cols as you can see in schema. How do I get rid of them so I can proceed to next step and do an orderBy?

val sqlDF = spark.sql("SELECT * FROM 
  parquet.`parquet/20171009121227/rels/*.parquet`")

Getting a dataFrame:

val final_df = groupedBy_DF.select(
  groupedBy_DF("collect_list(relev)").as("rel"),
  groupedBy_DF("collect_list(relev2)").as("rel2"))

then printing the schema gives us: final_df.printSchema

|-- rel: array (nullable = true)
|    |-- element: double (containsNull = true)
|-- rel2: array (nullable = true)
|    |-- element: double (containsNull = true)

Sample current output:

I am trying to convert to this:

 |-- rel: double (nullable = true)
 |-- rel2: double (nullable = true)

Desired example output (from the picture above):

-1.0,0.0
-1.0,0.0

Upvotes: 2

Answers (3)

mochapuff

Reputation: 1

Try split

import org.apache.spark.sql.functions._

val final_df = groupedBy_DF.select(
  groupedBy_DF("collect_list(relev)").as("rel"),
  groupedBy_DF("collect_list(relev2)").as("rel2"))
  .withColumn("rel",split("rel",","))

Upvotes: 0

Shaido

Reputation: 28322

In the case where collect_list will always only return one value, use first instead. Then there is no need to handle the issue of having an Array. Note that this should be done during the groupBy step.

val spark = SparkSession.builder.getOrCreate()
import spark.implicits._

val final_df = df.groupBy(...)
  .agg(first($"relev").as("rel"), 
       first($"relev2").as("rel2"))

Upvotes: 3

ayplam

Reputation: 1953

Try col(x).getItem:

groupedBy_DF.select(
    groupedBy_DF("collect_list(relev)").as("rel"),
    groupedBy_DF("collect_list(relev2)").as("rel2")
).withColumn("rel_0", col("rel").getItem(0))

Upvotes: 1

How to get first value of WrappedArray in Spark?

Answers (3)

Related Questions