Kzryzstof
Kzryzstof

Reputation: 8382

How can I truncate the length of a string in a DataFrame Column?

I have a DataFrame that contains columns with text and I want to truncate the text in a Column to a certain length. I tried the following operation:

val updatedDataFrame = dataFrame.withColumn("NewColumn", col("ExistingColumn").take(15))

I get the following error because I transform the Column instead of its content:

notebook:7: error: value take is not a member of org.apache.spark.sql.Column .withColumn("NewColumn", col("ExistingColumn").take(15))

Upvotes: 1

Views: 6187

Answers (1)

Leo C
Leo C

Reputation: 22439

Use method substring, as shown below:

import org.apache.spark.sql.functions._
import spark.implicits._

val df = Seq( (1, "abcdef"), (2, "uvwx") ).toDF("id", "value")

df.withColumn("value3", substring($"value", 1, 3)).show
// +---+------+------+
// | id| value|value3|
// +---+------+------+
// |  1|abcdef|   abc|
// |  2|  uvwx|   uvw|
// +---+------+------+

Upvotes: 3

Related Questions