t1808
t1808

Reputation: 71

How to select distinct and non-null values from a dataframe column in pyspark

How can select distinct and non-null values from a dataframe column in py-spark.

Upvotes: 1

Views: 3073

Answers (1)

t1808
t1808

Reputation: 71

Ok, I figured it out...following is the command where i am selecting all the unique UserID's from column and excluding empty rows:

df.select('UserID').distinct().where(col("userid").isNotNull())

Still i believe there can possibly be better alternative.

Upvotes: 1

Related Questions