Onkar Jadhav
Onkar Jadhav

Reputation: 25

Creating array column but getting error : TypeError: 'list' object is not callable in PySpark

I want to create a array column from existing column in PySpark

--------------------------
col0 | col1 | col2 | col3
--------------------------
1    |a     |b     |c
--------------------------
2    |d     |e     |f
--------------------------

I want like this

-------------
col0 | col1 
-------------
1    |[a,b,c]
-------------
2    |[d,e,f]
--------------

I was trying array() function like this

>>> new = df.select("col0",array("col1","col2","col3").alias("col1"))

but getting this error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'list' object is not callable

Please if anyone have solution on this ..

Upvotes: 0

Views: 644

Answers (1)

dsk
dsk

Reputation: 2003

You need to use withColumn() first while creating a new column , post that you can use select() in order to select columns as per your choice

df = df.withColumn("col0", array("col1","col2","col3"))
df = df.select("col0")

and you are getting this error because, you are using .alias() function and the compiler is complaining about that

Upvotes: 1

Related Questions