rafald
rafald

Reputation: 23

How to provide value from the same row to scala spark substring function?

I've got following Dataframe with fnamelname column that I want to transform:

+---+---------------+---+----------+--------+
| id|     fnamelname|age|       job|ageLimit|
+---+---------------+---+----------+--------+
|  1|    xxxxx xxxxx| 28|   teacher|      18|
|  2|  xxxx xxxxxxxx| 30|programmer|       0|
|  3|    xxxxx xxxxx| 28|   teacher|      18|
|  8|xxxxxxx xxxxxxx| 12|programmer|       0|
|  9| xxxxx xxxxxxxx| 45|programmer|       0|
+---+---------------+---+----------+--------+
only showing top 5 rows

root
 |-- id: string (nullable = true)
 |-- fnamelname: string (nullable = true)
 |-- age: integer (nullable = false)
 |-- job: string (nullable = true)
 |-- ageLimit: integer (nullable = false)

I want to use ageLimit as a len value within substring function, but somehow .cast("Int") function doesn't apply to a value of that row.

val readyDF: Dataset[Row] = peopleWithJobsAndAgeLimitsDF.withColumn("fnamelname",
  substring(col("fnamelname"), 0, col("ageLimit").cast("Int")))

All I'm getting is:

 found   : org.apache.spark.sql.Column
 required: Int
        col("fnamelname"),0, col("ageLimit").cast("Int")))

How to provide a value of another column as a variable to function within .withColumn()?

Upvotes: 0

Views: 350

Answers (2)

Hristo Iliev
Hristo Iliev

Reputation: 74395

The substring function takes an Int argument for the substring length. col("ageLimit").cast("Int") is not Int but another Column object holding the integer values of whatever was in the ageLimit column.

Instead, use the substr method of Column. It has an overload that takes two Columns for the position and the substring length. To pass a literal 0 for the position column, use lit(0):

val readyDF = peopleWithJobsAndAgeLimitsDF.withColumn("fnamelname",
  col("fnamelname").substr(lit(0), col("ageLimit")))

Upvotes: 4

partlov
partlov

Reputation: 14277

You can't do this directly using substring (or any other function with similar signature). You must use expr, so solution would be something like:

peopleWithJobsAndAgeLimitsDF
.withColumn(
  "fnamelname",
  expr("substring(fnamelname, 0, ageLimit)")
)

Upvotes: 3

Related Questions