Reputation: 25
I'm trying to read a CSV file in spark using spark df. The file doesn't have a header column but I want to have the header column. How to do that? I don't know if I'm correct or not, I wrote this command -> val df = spark.read.format("csv").load("/path/genchan1.txt").show()
and got the column name as _c0 and _c1 for columns. Then I tried to change the column name to the desired names using: val df1 = df.withColumnRenamed("_c0","Series") , But I'm getting "withColumnRenamed" is not a member on unit.
PS: I have imported spark.implicits._ and spark.sql.functions already.
Please help me know is there any way to add a column header to dataset and why I'm getting this issue.
Upvotes: 1
Views: 3408
Reputation: 1054
If you know the structure of CSV file beforehand, defining a schema and attaching it to df while loading data to it is a better solution.
Sample code for quick reference -
import org.apache.spark.sql.types._
val customSchema = StructType(Array(
StructField("Series", StringType, true),
StructField("Column2", StringType, true),
StructField("Column3", IntegerType, true),
StructField("Column4", DoubleType, true))
)
val df = spark.read.format("csv")
.option("header", "false") #since your file does not have header
.schema(customSchema)
.load("/path/genchan1.txt")
df.show()
Upvotes: 1
Reputation: 6323
return type of show
is Unit
. Please remove show
from the end.
val df = spark.read.format("csv").load("/path/genchan1.txt")
df.show()
you can then use all df functionality-
val df1 = df.withColumnRenamed("_c0","Series")
Upvotes: 3