how to read multiple csv files with different schema in pyspark?

Question

I have different csv files kept in sub folders in a given folder and some of them have one format and some of them have another format in the column names.

april_df = spark.read.option("header", True).option("inferSchema", True).csv('/mnt/range/2018_04_28_00_11_11/')

Above command only refers to one format and ignores other format. Is there any quick way in the parameter like mergeschema for parquet?

format of some files is like:

id ,f_facing ,l_facing ,r_facing ,remark

other is

id, f_f, l_f ,r_f ,remark

but there could be chances in the future that some columns are missing etc so need a robust way to handle this.

how to read multiple csv files with different schema in pyspark?

Answers (1)

Related Questions