Reputation: 395
I have run Great Expectation check expect_column_values_to_be_unique check on one of the column. It produced the following result as below.Total There are 62 Duplicates but in the output list it is returning only 20 elements. How to retrieve all duplicate records in that column.
df.expect_column_values_to_be_unique('A')
"exception_info": null,
"expectation_config": {
"expectation_type": "expect_column_values_to_be_unique",
"kwargs": {
"column": "A",
"result_format": "BASIC"
},
"meta": {}
},
"meta": {},
"success": false,
"result": {
"element_count": 100,
"missing_count": 0,
"missing_percent": 0.0,
"unexpected_count": 62,
"unexpected_percent": 62.0,
"unexpected_percent_nonmissing": 62.0,
"partial_unexpected_list": [
37,
62,
72,
53,
22,
61,
95,
21,
64,
59,
77,
53,
0,
22,
24,
46,
0,
16,
78,
60
]
}
}
Upvotes: 0
Views: 1680
Reputation: 56
You're currently passing result_format
as BASIC
. To get the level of detail you're looking for, you'll want to instead pass result_format
for this Expectation as COMPLETE
to get the full list of unexpected values. For example:
df.expect_column_values_to_be_unique(column="A", result_format="COMPLETE")
See this documentation for more on result_format
.
Upvotes: 2
Reputation: 5125
I think you are using "show" without parameters. By default this only shows the first 20 rows. If you wish to see more you need to pass in how many rows you want to see: (This will show you 200 rows, and not truncate the length of the column)
df.select( col("*") ).show(200,false)
Upvotes: 0