Creating array column but getting error : TypeError: 'list' object is not callable in PySpark

Question

I want to create a array column from existing column in PySpark

--------------------------
col0 | col1 | col2 | col3
--------------------------
1    |a     |b     |c
--------------------------
2    |d     |e     |f
--------------------------

I want like this

-------------
col0 | col1 
-------------
1    |[a,b,c]
-------------
2    |[d,e,f]
--------------

I was trying array() function like this

>>> new = df.select("col0",array("col1","col2","col3").alias("col1"))

but getting this error

Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'list' object is not callable

Please if anyone have solution on this ..

dsk · Accepted Answer

You need to use withColumn() first while creating a new column , post that you can use select() in order to select columns as per your choice

df = df.withColumn("col0", array("col1","col2","col3"))
df = df.select("col0")

and you are getting this error because, you are using .alias() function and the compiler is complaining about that

Creating array column but getting error : TypeError: 'list' object is not callable in PySpark

Answers (1)

Related Questions

Creating array column but getting error : TypeError: &#39;list&#39; object is not callable in PySpark

Answers (1)

Related Questions

Creating array column but getting error : TypeError: 'list' object is not callable in PySpark