Reputation: 988
I am trying to load a csv and make the second line as header. How to achieve this. Please let me know. Thanks.
file_location = "/mnt/test/raw/data.csv"
file_type = "csv"
infer_schema = "true"
delimiter = ","
data = spark.read.format(file_type) \
.option("inferSchema", infer_schema) \
.option("header", "false") \
.option("sep", delimiter) \
.load(file_location) \
Upvotes: 3
Views: 7033
Reputation: 128
First Read the data as rdd and then pass this rdd to df.read.csv()
data=sc.TextFile('/mnt/test/raw/data.csv')
firstRow=data.first()
data=data.filter(lambda row:row != firstRow)
df = spark.read.csv(data,header=True)
For reference of dataframe functions use the below link, This would serve as bible for all of the dataframe operations you need, for specific version of spark replace "latest" in url to whatever version you want:
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html
Upvotes: 3