Reputation: 1448
I am using Apache Spark with Scala.
I have a csv file that does not have column names in the first row. It's like this:
28,Martok,49,476
29,Nog,48,364
30,Keiko,50,175
31,Miles,39,161
The columns represent ID, name, age, numOfFriends.
In my Scala object, I am creating dataset using SparkSession from csv file as follows:
val spark = SparkSession.builder.master("local[*]").getOrCreate()
val df = spark.read.option("inferSchema","true").csv("../myfile.csv")
df.printSchema()
When I run the program, the result is:
|-- _c0: integer (nullable = true)
|-- _c1: string (nullable = true)
|-- _c2: integer (nullable = true)
|-- _c3: integer (nullable = true)
How can I add names to the columns in my dataset?
Upvotes: 10
Views: 18849
Reputation: 11
toDf
method can be used, where you can pass in the column name in spark java.
Example:
Dataset<Row> rowsWithTitle = sparkSession.read().option("header", "true").option("delimiter", "\t").csv("file").toDF("h1", "h2");
Upvotes: 1
Reputation: 22449
You can use toDF
to specify column names when reading the CSV file:
val df = spark.read.option("inferSchema","true").csv("../myfile.csv").toDF(
"ID", "name", "age", "numOfFriends"
)
Or, if you already have the DataFrame created, you can rename its columns as follows:
val newColNames = Seq("ID", "name", "age", "numOfFriends")
val df2 = df.toDF(newColNames: _*)
Upvotes: 26