Reputation: 87
I have two csv's. First csv has only 1 row which is headers. 2nd csv has values. I want to create the dataframe which has headers from row1 from csv1 and values from all rows within csv 2. Both the csv's has same number of fields starting from _c0 till _c1000 (has about 1000 columns). Columns types can be different within each csv but column names and number of columns will be same. Below is the example snip. I am using databricks (pyspark). Any help is appreciated.
Upvotes: 0
Views: 156
Reputation: 1857
You can impose the schema resulted from reading the first file on reading the second file:
df1 = spark.read.option('header', True).csv('<path to the file with header>')
df2 = spark.read.schema(df1.schema).csv('<path to the file without header>')
Upvotes: 0