Reputation: 99
I have below code to read xml
Dataset<Row> dataset1 = SparkConfigXMLProcessor.sparkSession.read().format("com.databricks.spark.xml")
.option("rowTag", properties.get(EventHubConsumerConstants.IG_ORDER_TAG).toString())
.load(properties.get("C:\\inputOrders.xml").toString());
one of the column value getting new line character. i want to replace it with some character or just want to remove it. Please help
Upvotes: 4
Views: 19851
Reputation: 73
This is what I used. I usually add a tab (\t), too. Having both \r and \n will find UNIX (\n), Windows (\r), and OSX (\r) newlines.
Dataset<Row> newDF = dataset1.withColumn("menuitemname", regexp_replace(col("menuitemname"), "\n|\r", ""));
Upvotes: 3
Reputation: 1046
dataset1.withColumn("menuitemname_clean", regexp_replace(col("menuitemname"), "[\n\r]", " "))
Above code will work
Upvotes: 8
Reputation: 99
Below code resolve my issue
Dataset<Row> newDF = dataset1.withColumn("menuitemname", regexp_replace(col("menuitemname"), "[\\n]", ""));
Upvotes: -3