Umi
Umi

Reputation: 137

Compare value of Dataframe column with list value

I have a spark dataframe columns 'id' and 'articles' and a list of values 'a_list' as below.

df = spark.createDataFrame([(1, 4), (2, 3), (5, 6)], ("id", "articles"))

a_list = [1, 4, 6]

I am trying to compare list value with value of dataframe column "articles" and if match found updating column 'E' to 1 else 0

I am using "isin" in my code below

df['E'] = df.articles.isin(a_list).astype(int)

Getting

TypeError: unexpected type: <type 'type'>

What am I missing here ?

Upvotes: 3

Views: 3386

Answers (1)

akuiper
akuiper

Reputation: 214957

Provide your type as string "int" instead of int which is python's native type that spark doesn't recognize; Also to create a column in spark data frame, use withColumn method instead of direct assignment:

df.withColumn('E', df.articles.isin(a_list).astype('int')).show()
+---+--------+---+
| id|articles|  E|
+---+--------+---+
|  1|       4|  1|
|  2|       3|  0|
|  5|       6|  1|
+---+--------+---+

Upvotes: 2

Related Questions