Reputation: 71
How can select distinct and non-null values from a dataframe column in py-spark.
Upvotes: 1
Views: 3073
Reputation: 71
Ok, I figured it out...following is the command where i am selecting all the unique UserID's from column and excluding empty rows:
df.select('UserID').distinct().where(col("userid").isNotNull())
Still i believe there can possibly be better alternative.
Upvotes: 1