elbuenretiro
elbuenretiro

Reputation: 3

create a spark dataframe column consists of a list as data type

I have an existing dataframe 'df' with a column 'list_len' and I want to create a column that consists of empty strings with length indicated by the value of 'list_len'.

I tried to do df.withColumn('new_list', array(['']*col('list_len'))).show() in pyspark but it did not work.

any idea/help is greatly appreciated!

+---------+------------------+
|list_len |        new_list  |
+---------+------------------+
|        1|              ['']|
|        3|      ['', '', '']|
|        2|          ['', '']|
+----------------------------+

Upvotes: 0

Views: 439

Answers (1)

Matt
Matt

Reputation: 650

scala:

import org.apache.spark.sql.functions.{lit,array_repeat}
import spark.implicits._

val df = Seq(1, 2, 3).toDF("list_len")
df.withColumn("new_list", array_repeat(lit(""), $"list_len"))

reference: https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#array_repeat-org.apache.spark.sql.Column-org.apache.spark.sql.Column-

pyspark:

from pyspark.sql.functions import lit, array_repeat, col
df.withColumn("new_list", array_repeat(lit(""), col("list_len")))

reference: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.array_repeat

Upvotes: 1

Related Questions