Reputation: 159
I have a dataset containing below two rows
s.no,name,Country
101,xyz,India,IN
102,abc,UnitedStates,US
I am trying to escape the commas of each column but not for last column I want them the same and get the output using spark-shell. I tried using the below code but it has given me the different output.
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("delimiter", ",").option("escape", "\"").load("/user/username/data.csv").show()
The output it has given me is
+----+-----+------------+
|s.no| name| Country|
+----+-----+------------+
| 101| xyz| India|
| 102| abc|UnitedStates|
+----+-----+------------+
But I am expecting output to be like below What I am missing here can anyone help me?
s.no name Country
101 xyz India,IN
102 abc UnitedStates,US
Upvotes: 1
Views: 1239
Reputation: 23119
I suggest to read
the all the fields with providing schema
and ignoring the header present in data as below
case class Data (sno: String, name: String, country: String, country1: String)
val schema = Encoders.product[Data].schema
import spark.implicits._
val df = spark.read
.option("header", true)
.schema(schema)
.csv("data.csv")
.withColumn("Country" , concat ($"country", lit(", "), $"country1"))
.drop("country1")
df.show(false)
Output:
+---+----+----------------+
|sno|name|Country |
+---+----+----------------+
|101|xyz |India, IN |
|102|abc |UnitedStates, US|
+---+----+----------------+
Hope this helps!
Upvotes: 1