Reputation: 16723
My data is in a csv
file. The file hasn't got any header column
United States Romania 15
United States Croatia 1
United States Ireland 344
Egypt United States 15
If I read it, Spark
creates names for the columns automatically.
scala> val data = spark.read.csv("./data/flight-data/csv/2015-summary.csv")
data: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 1 more field]
Is it possible to provide my own names for the columns when reading the file if I don't want to use _c0
, _c1
? For eg, I want spark
to use DEST
, ORIG
and count
for column names. I don't want to add header row in the csv
to do this
Upvotes: 1
Views: 1777
Reputation: 561
It's better to define schema (StructType
) first, then load the csv data using the schema.
Here is how to define schema:
import org.apache.spark.sql.types._
val schema = StructType(Array(
StructField("DEST",StringType,true),
StructField("ORIG",StringType,true),
StructField("count",IntegerType,true)
))
Load the dataframe:
val df = spark.read.schema(schema).csv("./data/flight-data/csv/2015-summary.csv")
Hopefully it'll help you.
Upvotes: 0
Reputation: 3367
Yes you can, There is a way, You can us toDF
function of dataframe.
val data = spark.read.csv("./data/flight-data/csv/2015-summary.csv").toDF("DEST", "ORIG", "count")
Upvotes: 2