Reputation: 23
I've got following Dataframe with fnamelname
column that I want to transform:
+---+---------------+---+----------+--------+
| id| fnamelname|age| job|ageLimit|
+---+---------------+---+----------+--------+
| 1| xxxxx xxxxx| 28| teacher| 18|
| 2| xxxx xxxxxxxx| 30|programmer| 0|
| 3| xxxxx xxxxx| 28| teacher| 18|
| 8|xxxxxxx xxxxxxx| 12|programmer| 0|
| 9| xxxxx xxxxxxxx| 45|programmer| 0|
+---+---------------+---+----------+--------+
only showing top 5 rows
root
|-- id: string (nullable = true)
|-- fnamelname: string (nullable = true)
|-- age: integer (nullable = false)
|-- job: string (nullable = true)
|-- ageLimit: integer (nullable = false)
I want to use ageLimit
as a len value within substring
function, but somehow .cast("Int")
function doesn't apply to a value of that row.
val readyDF: Dataset[Row] = peopleWithJobsAndAgeLimitsDF.withColumn("fnamelname",
substring(col("fnamelname"), 0, col("ageLimit").cast("Int")))
All I'm getting is:
found : org.apache.spark.sql.Column
required: Int
col("fnamelname"),0, col("ageLimit").cast("Int")))
How to provide a value of another column as a variable to function within .withColumn()
?
Upvotes: 0
Views: 350
Reputation: 74395
The substring
function takes an Int
argument for the substring length. col("ageLimit").cast("Int")
is not Int
but another Column
object holding the integer values of whatever was in the ageLimit
column.
Instead, use the substr
method of Column
. It has an overload that takes two Column
s for the position and the substring length. To pass a literal 0
for the position column, use lit(0)
:
val readyDF = peopleWithJobsAndAgeLimitsDF.withColumn("fnamelname",
col("fnamelname").substr(lit(0), col("ageLimit")))
Upvotes: 4
Reputation: 14277
You can't do this directly using substring
(or any other function with similar signature). You must use expr
, so solution would be something like:
peopleWithJobsAndAgeLimitsDF
.withColumn(
"fnamelname",
expr("substring(fnamelname, 0, ageLimit)")
)
Upvotes: 3