how to solve ? (add list to column dataframe pyspark)

Question

if I have exist dataframe, and i want to add new column to that data frame

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
from pyspark.sql import Row
numbers=[1,2,30,4]
rdd1 = sc.parallelize(li)
row_rdd = rdd1.map(lambda x: Row(x))
test_df = sqlContext.createDataFrame(row_rdd,['numbers'])
-------------------------------------------------------------------------
test_df.show()
-------------------------------------------------------------------------
+-------+
|numbers|
+-------+
|      1|
|      2|
|     30|
|      4|
+-------+
-------------------------------------------------------------------------

#add list to column exist dataframe
rating = [40,32,12,21]
rdd2 = sc.parallelize(li2)
row_rdd2 = rdd2.map(lambda x: Row(x))
test_df2 = test_df.withColumn("rating", row_rdd2)

my expectation

+-------+--------+
|numbers|rating  |
+-------+--------+
|      1|      40|
|      2|      32|
|     30|      12|
|      4|      21|
+-------+--------+

reallity

AssertionError: col should be Column

how to solve ? add list to column dataframe pyspark

thanks

how to solve ? (add list to column dataframe pyspark)

Answers (1)

Related Questions