Reputation: 11
I am using Spark 3.3.1 to read input CSV file having below header and value
ID, CREATE_DATE
1, 14:42:23.0
I'm passing only time(HH:MM:SS.SSS) where as DATE(YYYY-MM-DD) is missing in CREATE_DATE field and reading CREATE_DATE field as TimestampType() with defined schema with below code.
# Define schema
sparkCustomSchema = StructType([StructField('ID', IntegerType(), True), StructField('CREATE_DATE', TimestampType(), True),StructField('_corrupt_record', StringType(), True)])
# Read CSV with Schema
input_df = sparkSession.read.format("csv").schema(sparkCustomSchema).option("sep", ",").option("header", "true").option(
"mode", "PERMISSIVE").option("timestampFormat", "yyyy-MM-dd HH:mm:ss.SSS").option(
"ignoreLeadingWhiteSpace", "true").option("ignoreTrailingWhiteSpace", "true").option(
"treatEmptyValuesAsNulls", "true").csv(csv_input_path)
I'm gettig below result when input_df.show()
ID | CREATE_DATE | _corrupt_record |
---|---|---|
1 | 2023-07-09 14:42:23 | null |
In the above result, current date 2023-07-09 automatically got inserted along with timestamp even though date is missing in input csv record. Spark 3.3.1 not considering it as corrupt record for input 14:42:23.0. but in Spark 2.4 It is considering as corrupt record and giving below output.
Spark 2.4 Output:
ID | CREATE_DATE | _corrupt_record |
---|---|---|
null | null | 1, 14:42:23 |
I tried by setting spark.sql.legacy.timeParserPolicy
property to LEGACY and CORRECTED but no change in result. Still Inserting current date in given timestamp and not considering it as corrupt record.
Is there a way, how to mark this record as corrupt and achieve same result as spark 2.4 for missing date record in spark 3.3.1
I tried to run the code by setting up below property but still not considering it as corrupt record and inserting current_date along with give timestamp
sparkSession.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
# sparkSession.conf.set("spark.sql.legacy.timeParserPolicy", "CORRECTED")
How to achieve the same result as spark 2.4 in spark 3.3.1
Upvotes: 1
Views: 97