werner
werner

Reputation: 14845

Counting elements in array column with map

I have a dataframe that contains only one column with arrays

val df: DataFrame = Seq(
  (Array("a", "b", "c")),
  (Array("d", "e"))
).toDF("value")

Schema:

root
 |-- value: array (nullable = true)
 |    |-- element: string (containsNull = true)

When I count the number of elements in each array using a column expression, I get the expected result:

df.select(size($"value")).show

prints

+-----------+
|size(value)|
+-----------+
|          3|
|          2|
+-----------+

When I try to map each row to its size, I only get a 1 in each row:

df.map(_.size).show

prints

+-----+
|value|
+-----+
|    1|
|    1|
+-----+

Why does the second version only print 1 for each array instead of the array's size?

Upvotes: 1

Views: 689

Answers (1)

Raphael Roth
Raphael Roth

Reputation: 27373

size on a Row gives the number of columns/fields, the doc says:

Number of elements in the Row

which is 1 in your case.

What you can do instead is :

df.map(_.getSeq(0).size)
  .show()

gives:

+-----+
|value|
+-----+
|    3|
|    2|
+-----+

Upvotes: 4

Related Questions