Reputation: 1330
I have different csv files kept in sub folders in a given folder and some of them have one format and some of them have another format in the column names.
april_df = spark.read.option("header", True).option("inferSchema", True).csv('/mnt/range/2018_04_28_00_11_11/')
Above command only refers to one format and ignores other format. Is there any quick way in the parameter like mergeschema for parquet?
format of some files is like:
id ,f_facing ,l_facing ,r_facing ,remark
other is
id, f_f, l_f ,r_f ,remark
but there could be chances in the future that some columns are missing etc so need a robust way to handle this.
Upvotes: 2
Views: 3722
Reputation: 478
It is not. Either the column should be filled with null in the pipeline or you will have to specify the schema before you import the file. But if you have an understanding of what columns might be missing in the future, you could possibly create a scenario where based on length of the df.columns
, you specify the schema, although it seems tedious.
Upvotes: 1