Xi12
Xi12

Reputation: 1223

How to compare 2 columns in pyspark dataframe using asserts functions

I am using the below code to compare 2 columns in data frame. I dont want to do it in pandas. Can someone help how to compare using spark data frames?

    df1=context.spark.read.option("header",True).csv("./test/input/test/Book1.csv",) 
    df1=df1.withColumn("Curated", dataclean.clean_email(col("email")))
    df1.show()
    assert_array_almost_equal(df1['expected'], df1['Curated'],verbose=True)

Upvotes: 1

Views: 975

Answers (1)

abiratsis
abiratsis

Reputation: 7326

One efficient way would be to try to identify the first difference as soon as possible. One way to achieve that is via left-anti joins:

assert(df1.join(df1, (df1['expected'] == df1['Curated']), "leftanti").first() != None)

Upvotes: 1

Related Questions