Reputation: 291
I have a tab delimited file that has comments denoted by ##
. I would like to read the file into a DataFrame, and want to use something like:
val targetDF = sparkSession.read.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.option("delimiter", "\t")
.option("comment", "##")
.load(pathToFile)
When I try this I get a runtime exception: java.lang.RuntimeException: comment cannot be more than one character
. Best way to deal with this?
Upvotes: 1
Views: 2441
Reputation: 5315
Then use just a single '#', each line starting with '#' will be considered as a comment. This is what the API says :
comment (default empty string): sets the single character used for skipping lines beginning with this character. By default, it is disabled.
But be sure that no valid line starts with this character in your file.
val targetDF = sparkSession.read.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.option("delimiter", "\t")
.option("comment", "#")
.load(pathToFile)
Edit : because your records can contain a single '#' you'll have to omit the comment
option and just filter manually your Dataframe afterwards or remove any line starting with two '#' in your file before parsing it.
Upvotes: 2