Reputation: 14845
I have a dataframe that contains only one column with arrays
val df: DataFrame = Seq(
(Array("a", "b", "c")),
(Array("d", "e"))
).toDF("value")
Schema:
root
|-- value: array (nullable = true)
| |-- element: string (containsNull = true)
When I count the number of elements in each array using a column expression, I get the expected result:
df.select(size($"value")).show
prints
+-----------+
|size(value)|
+-----------+
| 3|
| 2|
+-----------+
When I try to map each row to its size, I only get a 1
in each row:
df.map(_.size).show
prints
+-----+
|value|
+-----+
| 1|
| 1|
+-----+
Why does the second version only print 1
for each array instead of the array's size?
Upvotes: 1
Views: 689
Reputation: 27373
size
on a Row
gives the number of columns/fields, the doc says:
Number of elements in the Row
which is 1 in your case.
What you can do instead is :
df.map(_.getSeq(0).size)
.show()
gives:
+-----+
|value|
+-----+
| 3|
| 2|
+-----+
Upvotes: 4