Reputation: 937
I want to create a new dataframe from existing dataframe based on a condition.
df1=>
id1 id2
11 i
11 k
20 l
20 m
20 n
31 k
31 j
Here if id2 in df1 is greater than k alphabetically then the new data frame df2 should be like shown below:
df2=>
id1 id2
11 0
20 1
31 0
Upvotes: 0
Views: 624
Reputation: 3419
Using F.when
:
df.withColumn("id2", F.when(col("id2")>"k", 1).otherwise(0)).show()
+---+---+
|id1|id2|
+---+---+
| 11| 0|
| 11| 0|
| 20| 1|
| 20| 1|
| 20| 1|
| 31| 0|
| 31| 0|
+---+---+
.distinct()
if you want to dedup.
Upvotes: 1