Reputation: 3
I have an existing dataframe 'df' with a column 'list_len' and I want to create a column that consists of empty strings with length indicated by the value of 'list_len'.
I tried to do df.withColumn('new_list', array(['']*col('list_len'))).show()
in pyspark but it did not work.
any idea/help is greatly appreciated!
+---------+------------------+
|list_len | new_list |
+---------+------------------+
| 1| ['']|
| 3| ['', '', '']|
| 2| ['', '']|
+----------------------------+
Upvotes: 0
Views: 439
Reputation: 650
scala:
import org.apache.spark.sql.functions.{lit,array_repeat}
import spark.implicits._
val df = Seq(1, 2, 3).toDF("list_len")
df.withColumn("new_list", array_repeat(lit(""), $"list_len"))
pyspark:
from pyspark.sql.functions import lit, array_repeat, col
df.withColumn("new_list", array_repeat(lit(""), col("list_len")))
reference: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.array_repeat
Upvotes: 1