Reputation: 25
I want to create a array column from existing column in PySpark
--------------------------
col0 | col1 | col2 | col3
--------------------------
1 |a |b |c
--------------------------
2 |d |e |f
--------------------------
I want like this
-------------
col0 | col1
-------------
1 |[a,b,c]
-------------
2 |[d,e,f]
--------------
I was trying array() function like this
>>> new = df.select("col0",array("col1","col2","col3").alias("col1"))
but getting this error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'list' object is not callable
Please if anyone have solution on this ..
Upvotes: 0
Views: 644
Reputation: 2003
You need to use withColumn()
first while creating a new column , post that you can use select()
in order to select
columns as per your choice
df = df.withColumn("col0", array("col1","col2","col3"))
df = df.select("col0")
and you are getting this error because, you are using .alias()
function and the compiler is complaining about that
Upvotes: 1