mesh
mesh

Reputation: 937

Create a new spark dataframe based on condition from existing dataframe

I want to create a new dataframe from existing dataframe based on a condition.

df1=>

id1 id2

11  i

11  k

20  l

20  m

20  n

31  k

31  j

Here if id2 in df1 is greater than k alphabetically then the new data frame df2 should be like shown below:

df2=>

id1 id2

11    0

20    1

31    0

Upvotes: 0

Views: 624

Answers (1)

Cena
Cena

Reputation: 3419

Using F.when :

df.withColumn("id2", F.when(col("id2")>"k", 1).otherwise(0)).show()

+---+---+                                                                       
|id1|id2|
+---+---+
| 11|  0|
| 11|  0|
| 20|  1|
| 20|  1|
| 20|  1|
| 31|  0|
| 31|  0|
+---+---+

.distinct() if you want to dedup.

Upvotes: 1

Related Questions